NAME
KinoSearch::Docs::Tutorial::BeyondSimple - A more flexible app structure.
DESCRIPTION
Goal
In this tutorial chapter, we'll rewrite our app from KinoSearch::Docs::Tutorial::Simple so that it behaves exactly the same, but offers greater possibilites for expansion.
To achieve this, we'll ditch KinoSearch::Simple and replace it with the six classes it uses internally:
KinoSearch::Schema - Plan out your index.
KinoSearch::Schema::FieldSpec - Define the properties of an index field.
KinoSearch::Analysis::PolyAnalyzer - A one-size-fits-all parser/tokenizer.
KinoSearch::InvIndexer - Manipulate index content.
KinoSearch::Searcher - Search an index.
KinoSearch::Search::Hits - Iterate over hits returned by a Searcher.
Schema
The first item we're going need is a custom subclass of KinoSearch::Schema.
# USConSchema.pm
package USConSchema;
use base 'KinoSearch::Schema';
A Schema subclass is analogous to an SQL table definition. It instructs other entities on how they should interpret the raw data in an inverted index and interact with it. First and foremost, it tells them what fields are available and how they're defined.
Since there's not much you can do with an SQL database before you define any tables, you might wonder how KinoSearch::Simple can add documents to an index without first declaring a Schema. The answer is: Simple modifies the Schema with each call to add_doc
. Expanding on our SQL metaphor, it's as if each INSERT
were preceded by either CREATE TABLE
or UPDATE TABLE
as needed. (The techniques used by Simple are described in KinoSearch::Docs::Cookbook::DynamicFields).
Since we know in advance that we're only going to be using three fields, we don't need to resort to such tricks; we can just declare all of them up front.
our %fields = (
title => 'KinoSearch::Schema::FieldSpec',
content => 'KinoSearch::Schema::FieldSpec',
url => 'KinoSearch::Schema::FieldSpec',
);
Declaring a %fields
hash with our
is the first of two requirements for subclassing KinoSearch::Schema. The other is declaring an analyzer() subroutine, which must return an object which isa KinoSearch::Analysis::Analyzer:
use KinoSearch::Analysis::PolyAnalyzer;
sub analyzer {
return KinoSearch::Analysis::PolyAnalyzer->new( language => 'en' );
}
As the same Schema subclass must, repeat must be used at both index-time and search time, it should be implemented as a free-standing Perl module that both invindexer.plx
and search.cgi
can use
. Finish USConSchema.pm
off with the obligatory true value, put it into a place where both your scripts will be able to find it, and adjust file system permissions as needed. This tutorial will assume that you have chosen to locate it in the cgi-bin directory.
Adaptations to invindexer.plx
In the indexing script, we'll replace our Simple object with a KinoSearch::InvIndexer. For the most part, it's a straight-up swap:
use USConSchema;
use KinoSearch::InvIndexer;
...
my $invindexer = KinoSearch::InvIndexer->new(
invindex => USConSchema->read($index_loc),
);
...
foreach my $filename (@filenames) {
my $doc = slurp_and_parse_file($filename);
$invindexer->add_doc($doc);
}
There's only one extra step required: at the end of the script, you must call finish() explicity. (KinoSearch::Simple calls finish() implicitly upon object destruction).
$invindexer->finish;
Adaptations to search.cgi
In our search script, KinoSearch::Simple has served as a thin wrapper around Searcher and Hits. Swapping out Simple for these two classes is straightforward, except for the return value of the search() function.
use USConSchema;
use KinoSearch::Searcher;
...
my $searcher = KinoSearch::Searcher->new(
invindex => USConSchema->read($index_loc),
);
my $hits = $searcher->search( # returns a Hits object, not a hit count
query => $q,
offset => $offset,
num_wanted => $hits_per_page,
);
my $hit_count = $hits->total_hits; # get the hit count here
...
while ( my $hit = $hits->fetch_hit_hashref ) {
...
}
$simple->search
returns a hit count; in contrast, $searcher->search
returns a Hits object, from which you may obtain a hit count via the total_hits() method.
Hooray!
Congratulations! Your app does the same thing as before... but now it's a lot easier to customize.
COPYRIGHT
Copyright 2005-2007 Marvin Humphrey
LICENSE, DISCLAIMER, BUGS, etc.
See KinoSearch version 0.20.