parsefile project is a hobby project creating a powerfull and flexible tool to parse all kinds of files, e.g. HTML files, for a variety of purposes. The project is under development, however, an Alpha version including source code is available to give a first impression of how it works. Although the online functionality is missing yet, own algorithms can already be written and added as well as performed offline.
There are eleven simple parsing algorithms included in the newest version:
html_remove_comments- removes comments from a HTML file
html_body- reduces the content of a HTML file to the content of its
HTMLSpider- searches for link (
<a>) tags inside a HTML file and adds this links to the list of files to be parsed
html_remove_tags- removes tags from a HTML file
count_strings- counts a list of strings inside the parsed files
show_content- shows the current content
print_content- prints the current content into the output file
print_binary- prints the binary code of the current content into the output file
wait- waits for a specified amount of seconds during the parsing of each file
command_line- gives access to the command line during the parsing process of each file
csv- parses csv files
The programming language is C++ using
g++ under the operating system Ubuntu Linux. The program and its source code, as well as the source code of the included parsing algorithms, is licensed under the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.
The following packages are available for download:
It is recommended to download the complete package, because at this stage of development the source code (especially the one of the parsing algorithms) is essential to understand how to add your own functionality to the program.
Further information can be found in the readme file included in the packages (
README.txt) and its online HTML version.
The current version of the program is
v0.8a (ALPHA), last changed on