NAME
Search::Indexer::Incremental::MD5 - Incrementally index your files
SYNOPSIS
use File::Find::Rule ;
use Readonly ;
Readonly my $DEFAUT_MAX_FILE_SIZE_INDEXING_THRESHOLD => 300 << 10 ; # 300KB
my $indexer
= Search::Indexer::Incremental::MD5::Indexer->new
(
USE_POSITIONS => 1,
INDEX_DIRECTORY => 'text_index',
get_perl_word_regex_and_stopwords(),
) ;
my @files = File::Find::Rule
->file()
->name( '*.pm', '*.pod' )
->size( "<=$DEFAUT_MAX_FILE_SIZE_INDEXING_THRESHOLD" )
->not_name(qr[auto | unicore | DateTime/TimeZone | DateTime/Locale])
->in('.') ;
indexer->add_files(@files) ;
indexer->add_files(@more_files) ;
indexer = undef ;
my $search_string = 'find_me' ;
my $searcher =
eval
{
Search::Indexer::Incremental::MD5::Searcher->new
(
USE_POSITIONS => 1,
INDEX_DIRECTORY => 'text_index',
get_perl_word_regex_and_stopwords(),
)
} or croak "No full text index found! $@\n" ;
my $results = $searcher->search($search_string) ;
# sort in decreasing score order
my @indexes = map { $_->[0] }
reverse
sort { $a->[1] <=> $b->[1] }
map { [$_, $results->[$_]{SCORE}] }
0 .. $#$results ;
for (@indexes)
{
print "$results->[$_]{PATH} [$results->[$_]{SCORE}].\n" ;
}
$searcher = undef ;
DESCRIPTION
This module implements an incremental text indexer and searcher based on Search::Indexer.
DOCUMENTATION
Given a list of files, this module will allow you to create an indexed text database that you can later query for matches. You can also use the siim command line application installed with this module.
SUBROUTINES/METHODS
show_database_information($index_directory)
Arguments
$index_directory - location of the index databases
Returns - A hash reference. Keys represent an information field.
Exceptions - Error opening the indexing database
delete_indexing_databases($index_directory)
Removes all the index databases in the passed directory
Arguments
$index_directory - location of the index databases
Returns - Nothing
Exceptions - Can't remove index databases.
search_string(\%arguments)
Displays all the files matching the search query.
Arguments
- \%arguments -
-
- -
- $arguments->{perl_mode} - Boolean - Use Perl specific word regex and stopwords
- $arguments->{stopwords_file} - Optional- Name of the file containing the stopwords to use (overridden by the perl option)
- $arguments->{index_directory} - The location of the index database
- $arguments->{use_position} - See Sear::Indexer for a complete documentation
- $arguments->{search} - String - The search query
- $arguments->{verbose} - Boolean - Display the document id and score if set
- $search_string -
Returns - Nothing
Exceptions - None
add_files(\%arguments, \@files)
Adds files to index, if the files are modified, and displays their name.
Arguments
- \%arguments -
-
- $arguments->{perl_mode} - Boolean - Use Perl specific word regex and stopwords
- $arguments->{stopwords_file} - Optional- Name of the file containing the stopwords to use (overridden by the perl option)
- $arguments->{index_directory} - The location of the index database
- $arguments->{use_position} - See Sear::Indexer for a complete documentation
- $arguments->{maximum_document_size} - Integer - Only files with size inferior to this limit will be added
- $arguments->{verbose} - Boolean - Display the document id and score if set
- \@files - Files to be added in the index
Returns - Nothing
Exceptions - None
remove_files(\%arguments, \@files)
Remove the passed files from the index
Arguments
- $\%arguments -
-
- $arguments->{perl_mode} - Boolean - Use Perl specific word regex and stopwords
- $arguments->{stopwords_file} - Optional- Name of the file containing the stopwords to use (overridden by the perl option)
- $arguments->{index_directory} - The location of the index database
- $arguments->{use_position} - See Sear::Indexer for a complete documentation
- $arguments->{verbose} - Boolean - Display the document id and score if set
- \@files - Files to be removed
Returns - Nothing
Exceptions - None
check_index(\%arguments)
check the files in the index
Arguments
- \%arguments -
-
- $arguments->{perl_mode} - Boolean - Use Perl specific word regex and stopwords
- $arguments->{stopwords_file} - Optional- Name of the file containing the stopwords to use (overridden by the perl option)
- $arguments->{index_directory} - The location of the index database
- $arguments->{use_position} - See Sear::Indexer for a complete documentation
- $arguments->{verbose} - Boolean - Display the document id and score if set
Returns - Nothing
Exceptions - None
get_file_MD5($file)
Returns the MD5 of the $file argument.
Arguments
Returns - A string containing the file md5
Exceptions - fails if the file can't be open
BUGS AND LIMITATIONS
None so far.
AUTHOR
Nadim ibn hamouda el Khemir
CPAN ID: NKH
mailto: nadim@cpan.org
LICENSE AND COPYRIGHT
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Search::Indexer::Incremental::MD5
You can also look for information at:
AnnoCPAN: Annotated CPAN documentation
RT: CPAN's request tracker
Please report any bugs or feature requests to L <bug-search-indexer-incremental-md5@rt.cpan.org>.
We will be notified, and then you'll automatically be notified of progress on your bug as we make changes.
Search CPAN
SEE ALSO
Search::Indexer::Incremental::MD5::Indexer and Search::Indexer::Incremental::MD5::Searcher