NAME

Lucy::Simple - Basic search engine.

SYNOPSIS

First, build an index of your documents.

my $index = Lucy::Simple->new(
    path     => '/path/to/index/'
    language => 'en',
);

while ( my ( $title, $content ) = each %source_docs ) {
    $index->add_doc({
        title    => $title,
        content  => $content,
    });
}

Later, search the index.

my $total_hits = $index->search(
    query      => $query_string,
    offset     => 0,
    num_wanted => 10,
);

print "Total hits: $total_hits\n";
while ( my $hit = $index->next ) {
    print "$hit->{title}\n",
}

DESCRIPTION

Lucy::Simple is a stripped-down interface for the Apache Lucy search engine library.

CONSTRUCTORS

new

my $lucy = Lucy::Simple->new(
    path     => '/path/to/index/',
    language => 'en',
);

Create a Lucy::Simple object, which can be used for both indexing and searching. Both parameters path and language are required.

  • path - Where the index directory should be located. If no index is found at the specified location, one will be created.

  • language - The language of the documents in your collection, indicated by a two-letter ISO code. 12 languages are supported:

    |-----------------------|
    | Language   | ISO code |
    |-----------------------|
    | Danish     | da       |
    | Dutch      | nl       |
    | English    | en       |
    | Finnish    | fi       |
    | French     | fr       |
    | German     | de       |
    | Italian    | it       |
    | Norwegian  | no       |
    | Portuguese | pt       |
    | Spanish    | es       |
    | Swedish    | sv       |
    | Russian    | ru       |
    |-----------------------|

METHODS

add_doc

$lucy->add_doc({
    location => $url,
    title    => $title,
    content  => $content,
});

Add a document to the index. The document must be supplied as a hashref, with field names as keys and content as values.

my $int = $simple->search(
    query      => $query       # required
    offset     => $offset      # default: 0
    num_wanted => $num_wanted  # default: 10
    sort_spec  => $sort_spec   # default: undef
);

Search the index. Returns the total number of documents which match the query. (This number is unlikely to match num_wanted.)

  • query - A search query string.

  • offset - The number of most-relevant hits to discard, typically used when “paging” through hits N at a time. Setting offset to 20 and num_wanted to 10 retrieves hits 21-30, assuming that 30 hits can be found.

  • num_wanted - The number of hits you would like to see after offset is taken into account.

  • sort_spec - A SortSpec, which will affect how results are ranked and returned.

next

my $hit_doc = $simple->next();

Return the next hit, or undef when the iterator is exhausted.

INHERITANCE

Lucy::Simple isa Clownfish::Obj.