NAME
Search::Xapian - Perl XS frontend to the Xapian C++ search library.
SYNOPSIS
use Search::Xapian;
my $db = Search::Xapian::Database->new( '[DATABASE DIR]' );
my $enq = $db->enquire( '[QUERY TERM]' );
printf "Running query '%s'\n", $enq->get_query()->get_description();
my @matches = $enq->matches(0, 10);
print scalar(@matches) . " results found\n";
foreach my $match ( @matches ) {
my $doc = $match->get_document();
printf "ID %d %d%% [ %s ]\n", $match->get_docid(), $match->get_percent(), $doc->get_data();
}
DESCRIPTION
This module wraps most methods of most Xapian classes. The missing classes and methods should be added in the future. It also provides a simplified, more 'perlish' interface to some common operations, as demonstrated above.
There are some gaps in the POD documentation for wrapped classes, but you can read the Xapian C++ API documentation at http://www.xapian.org/docs/apidoc/html/annotated.html for details of these. Alternatively, take a look at the code in the examples and tests.
If you want to use Search::Xapian and the threads module together, make sure you're using Search::Xapian >= 1.0.4.0 and Perl >= 5.8.7. As of 1.0.4.0, Search::Xapian uses CLONE_SKIP to make sure that the perl wrapper objects aren't copied to new threads - without this the underlying C++ objects can get destroyed more than once.
If you encounter problems, or have any comments, suggestions, patches, etc please email the Xapian-discuss mailing list (details of which can be found at http://www.xapian.org/lists.php).
EXPORT
None by default.
:db
- DB_OPEN
-
Open a database, fail if database doesn't exist.
- DB_CREATE
-
Create a new database, fail if database exists.
- DB_CREATE_OR_OPEN
-
Open an existing database, without destroying data, or create a new database if one doesn't already exist.
- DB_CREATE_OR_OVERWRITE
-
Overwrite database if it exists.
:ops
- OP_AND
-
Match if both subqueries are satisfied.
- OP_OR
-
Match if either subquery is satisfied.
- OP_AND_NOT
-
Match if left but not right subquery is satisfied.
- OP_XOR
-
Match if left or right, but not both queries are satisfied.
- OP_AND_MAYBE
-
Match if left is satisfied, but use weights from both.
- OP_FILTER
-
Like OP_AND, but only weight using the left query.
- OP_NEAR
-
Match if the words are near each other. The window should be specified, as a parameter to
Search::Xapian::Query::Query
, but it defaults to the number of terms in the list. - OP_PHRASE
-
Match as a phrase (All words in order).
- OP_ELITE_SET
-
Select an elite set from the subqueries, and perform a query with these combined as an OR query.
- OP_VALUE_RANGE
-
Filter by a range test on a document value.
:qpflags
- FLAG_BOOLEAN
-
Support AND, OR, etc and bracketted subexpressions.
- FLAG_LOVEHATE
-
Support + and -.
- FLAG_PHRASE
-
Support quoted phrases.
- FLAG_BOOLEAN_ANY_CASE
-
Support AND, OR, etc even if they aren't in ALLCAPS.
- FLAG_WILDCARD
-
Support right truncation (e.g. Xap*).
- FLAG_PURE_NOT
-
Allow queries such as 'NOT apples'.
These require the use of a list of all documents in the database which is potentially expensive, so this feature isn't enabled by default.
- FLAG_PARTIAL
-
Enable partial matching.
Partial matching causes the parser to treat the query as a "partially entered" search. This will automatically treat the final word as a wildcarded match, unless it is followed by whitespace, to produce more stable results from interactive searches.
- FLAG_SPELLING_CORRECTION
- FLAG_SYNONYM
- FLAG_AUTO_SYNONYMS
- FLAG_AUTO_MULTIWORD_SYNONYMS
:qpstem
- STEM_ALL
-
Stem all terms.
- STEM_NONE
-
Don't stem any terms.
- STEM_SOME
-
Stem some terms, in a manner compatible with Omega (capitalised words and those in phrases aren't stemmed).
:enq_order
- ENQ_ASCENDING
-
docids sort in ascending order (default)
- ENQ_DESCENDING
-
docids sort in descending order
- ENQ_DONT_CARE
-
docids sort in whatever order is most efficient for the backend
:standard
Standard is db + ops + qpflags + qpstem
TODO
- Error Handling
-
Error handling for all methods liable to generate them.
- Documentation
-
Add POD documentation for all classes, where possible just adapted from Xapian docs.
- Unwrapped classes
-
The following Xapian classes are not yet wrapped: Error (and subclasses), ErrorHandler, ExpandDecider (and subclasses), user-defined weight classes.
We don't yet wrap Xapian::Query::MatchAll, Xapian::Query::MatchNothing, or Xapian::BAD_VALUENO.
- Unwrapped methods
-
The following methods are not yet wrapped: Enquire::get_eset(...) with more than two arguments, Query ctor optional "parameter" parameter, Remote::open(...), static Stem::get_available_languages().
We wrap MSet::swap() and MSet::operator[](), but not ESet::swap(), ESet::operator[](). Is swap actually useful? Should we instead tie MSet and ESet to allow them to just be used as lists?
CREDITS
Thanks to Tye McQueen <tye@metronet.com> for explaining the finer points of how best to write XS frontends to C++ libraries, James Aylett <james@tartarus.org> for clarifying the less obvious aspects of the Xapian API, Tim Brody for patches wrapping ::QueryParser and ::Stopper and especially Olly Betts <olly@survex.com> for contributing advice, bugfixes, and wrapper code for the more obscure classes.
AUTHOR
Alex Bowley <kilinrax@cpan.org>
Please report any bugs/suggestions to <xapian-discuss@lists.xapian.org> or use the Xapian bug tracker http://www.xapian.org/bugs/. Please do NOT use the CPAN bug tracker or mail any of the authors individually.
SEE ALSO
Search::Xapian::BM25Weight, Search::Xapian::BoolWeight, Search::Xapian::Database, Search::Xapian::Document, Search::Xapian::Enquire, Search::Xapian::MultiValueSorter, Search::Xapian::PositionIterator, Search::Xapian::PostingIterator, Search::Xapian::QueryParser, Search::Xapian::Stem, Search::Xapian::TermGenerator, Search::Xapian::TermIterator, Search::Xapian::TradWeight, Search::Xapian::ValueIterator, Search::Xapian::Weight, Search::Xapian::WritableDatabase, and http://www.xapian.org/.