NAME

Search::Kinosearch::KSearch - Perform searches

DEPRECATED

Search::Kinosearch has been superseded by KinoSearch. Please use the new version.

SYNOPSIS

my $ksearch = Search::Kinosearch::KSearch->new(
    -mainpath => '/foo/bar/kindex',
    );

my $query = Search::Kinosearch::Query->new(
    -string     => 'this AND NOT (that OR "the other thing")',
    -lowercase  => 1,
    -tokenize   => 1,
    -stem       => 1,
    );
$ksearch->add_query( $query );
$ksearch->process;

while (my $result = $ksearch->fetch_hit_hashref) {
    print "$result->{title}\n";
}

DESCRIPTION

KSearch objects perform queries against the kindex files created by Kindexer objects.

Queries are fed into KSearch using Search::Kinosearch::Query objects. You can feed multiple Query objects to a KSearch object in order to fine tune your result set, but KSearch objects themselves are single shot -- if you need to perform multiple searches, you need to create multiple objects.

Multiple calls to add_query()

It is possible to perform a search which is the result of multiple queries - in fact, that is the only way to implement an "advanced search" interface:

my $find_the_word_people = Search::Kinosearch::Query->new(
    -string     => 'people',
    -required   => 1,
    -fields     => {
                        title    => 3,
                        bodytext => 1,
                   },
    -tokenize   => 1,
    -stem       => 1,
    -lowercase  => 1,
    );
my $in_article_ii_only = Search::Kinosearch::Query->new(
    -string => 'Article II',
    -required   => 1,
    -fields => {
                   section    => 1,
               },
    );
$ksearch->add_query( $find_the_word_people );
$ksearch->add_query( $in_article_ii_only   );
my $status = $ksearch->process; 
...

Since both queries are marked as '-required => 1', all documents returned must 1) match 'people' in one or both of the 'title' and 'bodytext' fields, and 2) match 'Article II' in the 'section' field.

Excerpts

Kinosearch attempts to find the section of the text with the greatest density of search terms in a field that you specify (typically the bodytext). Any search terms encountered within the text are highlighted with html tags. In addition to the field from which the excerpt is taken, Kinosearch gives you control over the length of the the excerpt and the text of the highlight tags.

CONSTRUCTOR

new()

my $ksearch = Search::Kinosearch::KSearch->new(
    -mainpath          => '/foo/kindex' # default: 'kindex'
    -freqpath          => '/ramd/fdata' # default: 'kindex/freqdata'
    -kindex            => $kindex,      # default: created using -mainpath
    -any_or_all        => 'any',        # default: 'any'
    -sort_by           => 'score',      # default: 'string'
    -allow_boolean     => 0,            # default: 1
    -allow_phrases     => 0,            # default: 1
    -num_results       => 20,           # default: 10
    -offset            => 40,           # default: 0
   #-language          => 'Es',         # default: 'En'
    -stoplist          => \%big_list    # default: see below
    -excerpt_field     => 'bodytext',   # default: undef
    -excerpt_length    => 200,          # default: 150
    -hl_tag_open       => '<b>',        # default: '<strong>'
    -hl_tag_close      => '</b>',       # default: '</strong>'
    );

Construct a KSearch object.

-mainpath

The path to your kindex.

-freqpath

Specify an alternative location for the frequency data -- most likely, a ram disk.

-kindex

A Search::Kinosearch::Kindex object. If you provide such an object, you don't need to specify -mainpath or -freqpath.

-any_or_all

Searches return results containing 'any' or 'all' search terms.

-sort_by

'score' or 'datetime'.

-allow_boolean

If set to 0, disables parenthetical groupings; boolean terms "AND", "OR" and "AND NOT"; and prepended +plus and -minus.

-allow_phrases

Enable/disable phrase-matching.

-num_results

Maximum number of documents returned.

-offset

Number of documents to skip when returning ranked results. Example: if -offset is set to 10, the first document returned will be the 11th most highly ranked.

-language

The language of the query. At present only 'En' works. See Search::Kinosearch::Lingua.

-stoplist

A hashref of words to exclude from the query. If no list is specified, a default list is loaded based on the -language parameter; for instance, if -language is set to 'Es', then $Search::Kinosearch::Lingua::Es::stoplist is used. Stopwords encountered in the query are reported in the search status hash returned by process().

-excerpt_field

Field to be used when generating excerpts.

-excerpt_length

Maximum length of excerpt, in characters.

-hl_tag_open

Override the default opening tag used to highlight search terms which appear in the excerpt.

-hl_tag_close

Override the default closing tag used to highlight search terms which appear in the excerpt.

METHODS

add_query()

$ksearch->add_query( $query )

Add a query, in the form of a Search::Kinosearch::Query object, to the KSearch object.

process()

my $searchstatus = $ksearch->process;
print "Documents matched: $searchstatus->{num_hits}\n";

Execute the search, generate a result list, and return a hashref pointing to information about the search.

Here's how the status hash might look if you were to search for 'we the people in order to form a more perfect union'.

$searchstatus = {
    num_docs            => 52
    num_hits            => 18,       
    stopwords           => {
        we  => undef,
        the => undef,
        in  => undef,
        to  => undef,
        a   => undef,
        },
    };
num_docs

The number of documents searched.

num_hits

The approximate number of documents matched. (The number is only approximate because it may include documents which have been marked as deleted, but not yet purged from the kindex.)

stopwords

A hash where the keys are stopwords encountered.

fetch_hit_hashref()

Shift ranked results off of an array. Each result is a hashref with all stored fields represented. Two special fields are added.

excerpt

A relevant excerpt taken from the field specified by the -excerpt_field parameter.

score

The document's numerical score.

TO DO

  • Think hard about the interface, specifically about all the parameters supplied to the constructor. If KSearch gets broken into smaller pieces, those parameters should go away. Better to do that soon, while the user base is small.

  • Break out excerpting/highlighting code into a separate module.

  • Sanity check: process can only be called once.

SEE ALSO

AUTHOR

Marvin Humphrey <marvin at rectangular dot com> http://www.rectangular.com

COPYRIGHT

Copyright (c) 2005 Marvin Humphrey. All rights reserved. This module is free software. It may be used, redistributed and/or modified under the same terms as Perl itself.