NAME

Text::Identify::BoilerPlate - Remove repeated text

VERSION

Version 0.3.1

SYNOPSIS

Finds boilerplate text (lines that are repeated across documents) in a list of plain text files.

use Text::Identify::BoilerPlate;

my @files = ('file1', 'file2', 'file3');
rem_boilerplate(\@files, { min_dupl => 4, ignore_digits => 0 });

New files are written, containing everything but the boilerplate text.

FUNCTIONS

rem_boilerplate()

rem_boilerplate() takes two arguments: A reference to a list of files to be processed, and a reference to a hash of options.

The options are:

min_dupl: The minimum number of thimes a line has to occur to be considered boilerplate (default: 3). Can be either an integer or a percentage ('50%') of the number of files processed. Minimum value: 2.
ignore_digits: Lines only seperated by differences in digits will be considered duplicates (default: yes).
suffix: Added to the new files (default: 'content').
only_headers_and_footers: Only sets consecutive lines of duplicates at the start and end of documents are considered boilerplate (default: yes).
digest: Lines will be replaced by a MD5 digest during duplicate compilation, saving memory (default: no).
log: Nname of the log file, where deleted lines are recorded; if set to false, no log will be created (default: './text-identify-boilerplate.log').

AUTHOR

Lars Nygaard, <lars.nygaard@inl.uio.no>

BUGS

The program needs extensive testing and tweaking before the simple algorithm can give consistently high-quality results.

Please report any bugs or feature requests to bug-text-identify-boilerplate@rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Text-Identify-BoilerPlate. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

COPYRIGHT & LICENSE

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

To install Text::Identify::BoilerPlate, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Text::Identify::BoilerPlate

CPAN shell

perl -MCPAN -e shell
install Text::Identify::BoilerPlate

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)