parsefile projectThe parsefile project is a hobby project creating a powerfull and flexible tool to parse all kinds of files, e.g. HTML files, for a variety of purposes. The project is under development, however, an Alpha version including source code is available to give a first impression of how it works. Although the online functionality is missing yet, own algorithms can already be written and added as well as performed offline.
There are ten simple parsing algorithms included in the newest version:
html_remove_comments - removes comments from a HTML filehtml_body - reduces the content of a HTML file to the content of its <body> tagHTMLSpider - searches for link (<a>) tags inside a HTML file and adds this links to the list of files to be parsedhtml_remove_tags - removes tags from a HTML filecount_strings - counts a list of strings inside the parsed filesshow_content - shows the current contentprint_content - prints the current content into the output fileprint_binary - prints the binary code of the current content into the output filewait - waits for a specified amount of seconds during the parsing of each filecommand_line - gives access to the command line during the parsing process of each fileThe programming language is C++ using g++ under the operating system Ubuntu Linux. The program and its source code, as well as the source code of the included parsing algorithms, is licensed under the GNU General Public License as published by the Free Software Foundation; either version 3 of the License, or (at your option) any later version.
The following packages are available for download:
It is recommended to download the complete package, because at this stage of development the source code (especially the one of the parsing algorithms) is essential to understand how to add your own functionality to the program.
Further information can be found in the readme file included in the packages (README.txt) and its online HTML version.
The current version of the program is v0.7a (ALPHA), last changed on 23/10/2011.