NAME

HTML::Index - Perl modules for creating and searching an index of HTML files

SYNOPSIS

  use HTML::Index::Store;

  $indexer = HTML::Index::Store->new(
      STOP_WORD_FILE      => '/path/to/stopword/file',
      DB                  => '/path/to/db/directory',
      COMPRESS            => 1,
      REFRESH             => 0,
      PARSER              => 'HTML',
  );

  for ( ... )
  {
      my $doc = HTML::Index::Document->new( 
          name        => $name,
          contents    => $contents,
          mod_time    => $mod_time,
      );
      $indexer->index_document( $doc );
  }

  for ( ... )
  {
      my $doc = HTML::Index::Document->new( path => $path );
      # name, contents, and mod_time are the path, contents and modification
      # time of $path
      $indexer->index_document( $doc );
  }

  my $search = HTML::Index::Store->new( DB_DIR => $db_dir );
  my @results = $search->search( $q );

DESCRIPTION

HTML::Index is a set of modules for creating an index of HTML documents so that they can be subsequently searched by keywords, or by Boolean combinations of keywords. It was originally inspired by indexer.pl script in the O'Reilly "CGI Programming with Perl, 2nd Edition" book (http://www.oreilly.com/catalog/cgi2/author.html).

All storage operations are contained in the HTML::Index::Store module that can be subclassed to support other storage options (such as BerkeleyDB files, or SQL databases). Two such subclasses (HTML::Index::Store::BerkeleyBD and HTML::Index::DataDumper) are included in the distribution.

The modules can be used to index any HTML documents - whether stored as files, or in a database. They support the use of stopword lists, soundex searches, compression of the inverted indexes and re-indexing of documents. Search queries can be expressed as compound Boolean expressions, composed of keywords, parentheses, and logical operators (OR, AND, NOT).

AUTHOR

Ave Wrigley <ave.wrigley@gmail.com>

COPYRIGHT

To install HTML::Index, copy and paste the appropriate command in to your terminal.

cpanm

cpanm HTML::Index

CPAN shell

perl -MCPAN -e shell
install HTML::Index

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

SYNOPSIS

DESCRIPTION

SEE ALSO

AUTHOR

COPYRIGHT

Module Install Instructions