NAME

ContentExtractorDriver.pl - Driver for HTML Content Extractor

SYNOPSIS

perl ContentExtractorDriver.pl <input file> <output file> <Ratio type>

DESCRIPTION

ContentExtractorDriver.pl attempts to extract the content from HTML documents. It attempts to remove tags, scripts and boilerplate text from the documents by trying to find the region of the HTML document that has the maximum ratio of words to tags.

AUTHOR

Jean Tavernier (jj.tavernier@gmail.com)

COPYRIGHT

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

AUTHOR

COPYRIGHT

SEE ALSO

Module Install Instructions