NAME
HTML::Index::Store - subclass'able module for storing inverted index files for the HTML::Index modules.
SYNOPSIS
my $store = HTML::Index::Store->new(
VERBOSE => 0,
MODE => 'r',
COMPRESS => 1,
DB => $db,
STOP_WORD_FILE => $swf,
);
$store->init(
TABLES => \%HTML::Index::TABLES,
REFRESH => 1,
);
DESCRIPTION
The HTML::Index::Store module is generic interface to provide storage for the inverted indexes used by the HTML::Index modules. The reference implementation uses in memory storage, so is not suitable for persistent applications (where the search / index functionality is seperated). Subclasses of this module should override the methods described below, and then be passed as a constructor argument to the HTML::Index::Create and HTML::Index::Search modules.
There is one subclass of this module provided with this distribution; HTML::Index::Store::BerkeleyDB.
CONSTRUCTOR OPTIONS
Constructor options allow the HTML::Index::Store to provide a token to identify the database that is being used (this might be a directory path of a Berkeley DB implementation, or a database descriptor for a DBI implementation). It also allows options (STOP_WORD_FILE and COMPRESS) to be set. These options are then stored in an options table in the database, and are therefore "sticky" - so that the search interface can automatically use the same options setting used at creating time.
- DB
-
Database identifier. Available to subclassed modules using the DB method call.
- MODE
-
Either 'r' or 'rw' depending on whether the HTML::Index::Store module is created in read only or read/write mode.
- VERBOSE
-
If true, print stuff to STDERR.
- STOP_WORD_FILE
-
This option is the path to a stopword file (see HTML::Index::Stopwords). If set, the same stopword file is available for both creation and searching of the index.
- COMPRESS
-
If true, use HTML::Index::Compress compression on the inverted index file. This option is also "sticky" for searching (obviously!).
METHODS
- init( %options )
-
The %options hash contains two keys:
- TABLES
-
The TABLES option is an hashref of table names that the HTML::Index::Store module is required to create and maintain. The keys of this hash are the names of the tables, and the values are one of 'HASH' or 'RECNO', which give a clue as to what type of data is required to be stored in that table (basically, keyed on an integer or a string). The current list of tables used is:
%HTML::Index::TABLES = ( options => 'HASH', file2fileid => 'HASH', fileid2file => 'RECNO', word2fileid => 'HASH', wordid2word => 'RECNO', soundex2wordid => 'HASH', fileid2modtime => 'RECNO', );
but a subclass of HTML::Index::Store should be prepare to store any list of tables provided to its init method using this option.
- REFRESH
-
If the value of this option is true, the module should flush the data from its tables at initialization.
- get( $table, $key )
-
Get the $key entry in the $table table.
- put( $table, $key, $val )
-
Set the $key entry in the $table table to the value $val.
- each( $table )
-
First call to each sets a cursor for the table $table, and returns a ( $key, $value ) pair. Subsequent calls advance the cursor and return subsequest ( $key, $value ) pairs. Returns ( undef, undef ) after the last entry has been returned.
- cput( $key, $val )
-
Iserts the key and value in the current table at the current cursor position (as determined by the most recent call to each).
- nkeys( $table )
-
Returns the number of keys in the $table table.
SEE ALSO
AUTHOR
Ave Wrigley <Ave.Wrigley@itn.co.uk>
COPYRIGHT
Copyright (c) 2001 Ave Wrigley. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.