The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

HTML::Index::Store - subclass'able module for storing inverted index files for the HTML::Index modules.

SYNOPSIS

my $store = HTML::Index::Store->new( 
    VERBOSE => 0,
    MODE => 'r',
    COMPRESS => 1,
    DB => $db,
    STOP_WORD_FILE => $swf,
);
$store->init(
    TABLES => \%HTML::Index::TABLES,
    REFRESH => 1,
);

DESCRIPTION

The HTML::Index::Store module is generic interface to provide storage for the inverted indexes used by the HTML::Index modules. The reference implementation uses in memory storage, so is not suitable for persistent applications (where the search / index functionality is seperated). Subclasses of this module should override the methods described below, and then be passed as a constructor argument to the HTML::Index::Create and HTML::Index::Search modules.

There is one subclass of this module provided with this distribution; HTML::Index::Store::BerkeleyDB.

CONSTRUCTOR OPTIONS

Constructor options allow the HTML::Index::Store to provide a token to identify the database that is being used (this might be a directory path of a Berkeley DB implementation, or a database descriptor for a DBI implementation). It also allows options (STOP_WORD_FILE and COMPRESS) to be set. These options are then stored in an options table in the database, and are therefore "sticky" - so that the search interface can automatically use the same options setting used at creating time.

DB

Database identifier. Available to subclassed modules using the DB method call.

MODE

Either 'r' or 'rw' depending on whether the HTML::Index::Store module is created in read only or read/write mode.

VERBOSE

If true, print stuff to STDERR.

STOP_WORD_FILE

This option is the path to a stopword file (see HTML::Index::Stopwords). If set, the same stopword file is available for both creation and searching of the index.

COMPRESS

If true, use HTML::Index::Compress compression on the inverted index file. This option is also "sticky" for searching (obviously!).

METHODS

init( %options )

The %options hash contains two keys:

TABLES

The TABLES option is an hashref of table names that the HTML::Index::Store module is required to create and maintain. The keys of this hash are the names of the tables, and the values are one of 'HASH' or 'RECNO', which give a clue as to what type of data is required to be stored in that table (basically, keyed on an integer or a string). The current list of tables used is:

%HTML::Index::TABLES = (
    options => 'HASH',
    file2fileid => 'HASH',
    fileid2file => 'RECNO',
    word2fileid => 'HASH',
    wordid2word => 'RECNO',
    soundex2wordid => 'HASH',
    fileid2modtime => 'RECNO',
);

but a subclass of HTML::Index::Store should be prepare to store any list of tables provided to its init method using this option.

REFRESH

If the value of this option is true, the module should flush the data from its tables at initialization.

get( $table, $key )

Get the $key entry in the $table table.

put( $table, $key, $val )

Set the $key entry in the $table table to the value $val.

each( $table )

First call to each sets a cursor for the table $table, and returns a ( $key, $value ) pair. Subsequent calls advance the cursor and return subsequest ( $key, $value ) pairs. Returns ( undef, undef ) after the last entry has been returned.

cput( $key, $val )

Iserts the key and value in the current table at the current cursor position (as determined by the most recent call to each).

nkeys( $table )

Returns the number of keys in the $table table.

SEE ALSO

HTML::Index

AUTHOR

Ave Wrigley <Ave.Wrigley@itn.co.uk>

COPYRIGHT

Copyright (c) 2001 Ave Wrigley. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.