NAME

Apache::Solr - Apache Solr (Lucene) extension

INHERITANCE

Apache::Solr is extended by
  Apache::Solr::JSON
  Apache::Solr::XML

SYNOPSIS

# use Log::Report mode => "DEBUG";
my $solr    = Apache::Solr->new(server => $url);
my $lwp     = $solr->agent;   # internal LWP::UserAgent

my $doc     = Apache::Solr::Document->new(...);
my $results = $solr->addDocument($doc);
$results or die $results->errors;

my $results = $solr->select(q => 'author:mark');
my $doc     = $results->selected(3);
print $doc->_author;

my $results = $solr->select(q => "really", hl => {fl=>'content'});
while(my $doc = $results->nextSelected)
{   my $hldoc = $results->highlighted($doc);
    print $hldoc->_content;
    ...
}

# based on Log::Report, hence (for communication errors and such)
use Log::Report;
dispatcher SYSLOG => 'default';  # now all warnings/error to syslog
try { $solr->select(...) }; print $@->wasFatal;

DESCRIPTION

Solr is a stand-alone full-text search-engine (based on Lucent), with loads of features. This module tries to provide a high level interface to the Solr server.

See http://wiki.apache.org/solr/ and http://lucene.apache.org/solr/

METHODS

Constructors

Apache::Solr->new(%options)

Create a client to connect to one "core" (collection) of the Solr server.

-Option        --Default
 agent           <created internally>
 autocommit      true
 core            undef
 format          'XML'
 retry_max       60
 retry_wait      5
 server          <required>
 server_version  <latest>
agent => LWP::UserAgent object

Agent which implements the communication between this client and the Solr server.

When you have multiple Apache::Solr objects in your program, you may want to share this agent, to share the connection. Since [0.94], this will happen automagically: the parameter defaults to the agent created for the previous object.

Do not forget to install LWP::Protocol::https if you need to connect via https.

autocommit => BOOLEAN

Commit all changes immediately unless specified differently.

core => NAME

Set the core name to be addressed by this client. When there is no core name specified, the core is selected by the server or already part of the URL.

You probably want to set-up a core dedicated for testing and one for the live environment.

format => 'XML'|'JSON'

Communication format between client and server. You may also instantiate Apache::Solr::XML or Apache::Solr::JSON directly.

retry_max => COUNT

[1.09] When the server(-connection) persists in producing errors, it may not recover at all. Let's not block the main code. Of course, it may take considerable time for each error to show, so the communication failure can take much, much longer than retry_wait times retry_max seconds.

You can disable retries with with '0'.

retry_wait => SECONDS

[1.09] When the connection to the Solr server fails, or when the server does not respond correctly, a retry is attempted after waiting a few seconds. You may use '0' to avoid waiting.

server => URL

The locations of the Solr server depends on the way the java environment is set-up. The URL is either an URI object or a string which can be instantiated as such.

server_version => VERSION

By default the latest version of the server software, currently 4.5. Try to get this setting right, because it will help you a lot in correct parameter use and support for the right features.

Accessors

$obj->agent()

Returns the LWP::UserAgent object which maintains the connection to the server.

$obj->autocommit( [BOOLEAN] )
$obj->core( [$core] )

Returns the $core, when not defined the default core as set by new(core). May return undef.

$obj->server( [$uri|STRING] )

Returns the URI object which refers to the server base address. You need to clone() it before modifying. You may set a new value as STRING or $uri object.

$obj->serverVersion()

Returns the specified version of the Solr server software (by default the latest). Treat this version as string, to avoid rounding errors.

Commands

$obj->queryTerms($terms)

Search for often used terms. See http://wiki.apache.org/solr/TermsComponent

$terms are passed to expandTerms() before being used.

Be warned: The result is not sorted when XML communication is used, even when you explicitly request it.

example:

my $r = $self->queryTerms(fl => 'subject', limit => 100);
if($r->success)
{   foreach my $hit ($r->terms('subject'))
    {   my ($term, $count) = @$hit;
        print "term=$term, count=$count\n";
    }
}

if(my $r = $self->queryTerms(fl => 'subject', limit => 100))
   ...
$obj->select( [\%options], @parameters )

Find information in the document collection.

This method has a HUGE number of parameters. These values are passed in the uri of the http query to the solr server. See expandSelect() for all the simplifications offered here. Sets of there parameters may need configuration help in the server as well.

[1.06] You may pass some options to process the selected results (the Apache::Solr::Result object initiation). For instance, sequential. For backwards compatability reasons, they have to be passed in a HASH as optional first parameter.

Updates

See http://wiki.apache.org/solr/UpdateXmlMessages. Missing are the atomic updates.

$obj->addDocument( <$doc|ARRAY>, %options )

Add one or more documents (Apache::Solr::Document objects) to the Solr database on the server.

-Option            --Default
 allowDups           <false>
 commit              <autocommit>
 commitWithin        undef
 overwrite           <true>
 overwriteCommitted  <not allowDups>
 overwritePending    <not allowDups>
allowDups => BOOLEAN

[removed since Solr 4.0] Use option overwrite.

commit => BOOLEAN
commitWithin => SECONDS

[Since Solr 3.4] Automatically translated into 'commit' for older servers. Currently, the resolution is milli-seconds.

overwrite => BOOLEAN
overwriteCommitted => BOOLEAN

[removed since Solr 4.0] Use option overwrite.

overwritePending => BOOLEAN

[removed since Solr 4.0] Use option overwrite.

$obj->commit(%options)
-Option        --Default
 expungeDeletes  <false>
 softCommit      <false>
 waitFlush       <true>
 waitSearcher    <true>
expungeDeletes => BOOLEAN

[since Solr 1.4]

softCommit => BOOLEAN

[since Solr 4.0]

waitFlush => BOOLEAN

[before Solr 1.4, removed in 4.0]

waitSearcher => BOOLEAN
$obj->delete(%options)

Remove one or more documents, based on id or query.

-Option       --Default
 commit         <autocommit>
 fromCommitted  true
 fromPending    true
 id             undef
 query          undef
commit => BOOLEAN

When specified, it indicates whether to commit (update the indexes) after the last delete. By default the value of new(autocommit).

fromCommitted => BOOLEAN

[deprecated since ?]

fromPending => BOOLEAN

[deprecated since ?]

id => ID|ARRAY-of-IDs

The expected content of the uniqueKey fields (usually named id) for the documents to be removed.

query => QUERY|ARRAY-of-QUERYs
$obj->extractDocument(%options)

Call the Solr Tika built-in to have the server translate various kinds of structured documents into Solr searchable documents. This component is also called "Solr Cell".

The %options are mostly passed on as attributes to the server call, but there are a few more. You need to pass either a file or string with data.

See http://wiki.apache.org/solr/ExtractingRequestHandler

-Option      --Default
 commit        new(autocommit)
 content_type  <from> filename
 file          undef
 string        undef
commit => BOOLEAN

[0.94] commit the document to the database.

content_type => MIME
file => FILENAME|FILEHANDLE

Either file or string must be used.

string => STRING|SCALAR

The document provided as normal text or a reference to raw text. You may also specify the file option with a filename.

example:

my $r = $solr->extractDocument(file => 'design.pdf'
  , literal_id => 'host');
$obj->optimize(%options)
-Option      --Default
 maxSegments   1
 softCommit    <false>
 waitFlush     <true>
 waitSearcher  <true>
maxSegments => INTEGER

[since Solr 1.3]

softCommit => BOOLEAN

[since Solr 4.0]

waitFlush => BOOLEAN

[before Solr 1.4, removed from 4.0]

waitSearcher => BOOLEAN
$obj->rollback()

[solr 1.4]

Core management

See http://lucidworks.lucidimagination.com/display/solr/Configuring+solr.xml The CREATE, SWAP, ALIAS, and RENAME actions are not yet supported, because they are not very useful, it seems.

$obj->coreReload( [$core] )

[0.94] Load a new core (on the server) from the configuration of this core. While the new core is initializing, the existing one will continue to handle requests. When the new Solr core is ready, it takes over and the old core is unloaded.

-Option--Default
 core    <this core>
core => NAME

example:

my $result = $solr->coreReload;
$result or die $result->errors;
$obj->coreStatus()

[0.94] Returns a HASH with information about this core. There is no description about the exact structure and interpretation of this data.

-Option--Default
 core    <this core>
core => NAME

example:

my $result = $solr->coreStatus;
$result or die $result->errors;

use Data::Dumper;
print Dumper $result->decoded->{status};
$obj->coreUnload(%options)

Removes a core from Solr. Active requests will continue to be processed, but no new requests will be sent to the named core. If a core is registered under more than one name, only the given name is removed.

-Option--Default
 core    <this core>
core => NAME

Helpers

Parameter pre-processing

Many parameters are passed to the server. The syntax of the communication protocol is not optimal for the end-user: it is too verbose and depends on the Solr server version.

General rules:

  • you can group them on prefix

  • use underscore as alternative to dots: less quoting needed

  • boolean values in Perl will get translated into 'true' and 'false'

  • when an ARRAY (or LIST), the order of the parameters get preserved

$obj->deprecated($message)

Produce a warning $message about deprecated parameters with the indicated server version.

$obj->expandExtract(PAIRS|ARRAY)

Used by extractDocument().

[0.93] If the key is literal or literals, then the keys in the value HASH (or ARRAY of PAIRS) get 'literal.' prepended. "Literals" are fields you add yourself to the SolrCEL output. Unless extractOnly, you need to specify the 'id' literal.

[0.94] You can also use fmap, boost, and resource with an HASH (or ARRAY-of-PAIRS). [0.97] the value in each PAIR may be a SCALAR (ref string) which circumvents some copying.

example:

my $result = $solr->extractDocument(string => $document
   , resource_name => $fn, extractOnly => 1
   , literals => { id => 5, b => 'tic' }, literal_xyz => 42
   , fmap => { id => 'doc_id' }, fmap_subject => 'mysubject'
   , boost => { abc => 3.5 }, boost_xyz => 2.0);
);
$obj->expandSelect(PAIRS)

The select() method accepts many, many parameters. These are passed to modules in the server, which need configuration before being usable.

Besides the common parameters, like 'q' (query) and 'rows', there are parameters for various (pluggable) backends, usually prefixed by the backend abbreviation.

  • expand

  • facet -> http://wiki.apache.org/solr/SimpleFacetParameters

  • hl (highlight) -> http://wiki.apache.org/solr/HighlightingParameters

  • mtl -> https://solr.apache.org/guide/8_11/morelikethis.html

  • stats -> http://wiki.apache.org/solr/StatsComponent

  • suggest -> https://solr.apache.org/guide/8_11/suggester.html

  • group -> http://wiki.apache.org/solr/FieldCollapsing

You may use WebService::Solr::Query to construct the query ('q').

example:

my @r = $solr->expandSelect
  ( q => 'inStock:true', rows => 10
  , facet => {limit => -1, field => [qw/cat inStock/], mincount => 1}
  , f_cat_facet => {missing => 1}
  , hl    => {}
  , mlt   => { fl => 'manu,cat', mindf => 1, mintf => 1 }
  , stats => { field => [ 'price', 'popularity' ] }
  , group => { query => 'price:[0 TO 99.99]', limit => 3 }
  );

# becomes (one line)
...?rows=10&q=inStock:true
  &facet=true&facet.limit=-1&facet.field=cat
     &f.cat.facet.missing=true&facet.mincount=1&facet.field=inStock
  &mlt=true&mlt.fl=manu,cat&mlt.mindf=1&mlt.mintf=1
  &stats=true&stats.field=price&stats.field=popularity
  &group=true&group.query=price:[0+TO+99.99]&group.limit=3
$obj->expandTerms(PAIRS|ARRAY)

Used by queryTerms() only.

example:

my @t = $solr->expandTerms('terms.lower.incl' => 'true');
my @t = $solr->expandTerms([lower_incl => 1]);   # same

my $r = $self->queryTerms(fl => 'subject', limit => 100);
$obj->ignored($message)

Produce a warning $message about parameters which will get ignored because they were not yet supported by the indicated server version.

$obj->removed($message)

Produce a warning $message about parameters which will not be passed on, because they were removed from the indicated server version.

Other helpers

$obj->endpoint($action, %options)

Compute the address to be called (for HTTP)

-Option--Default
 core    new(core)
 params  []
core => NAME

If no core is specified, the default of the server is addressed.

params => HASH|ARRAY-of-pairs

The order of the parameters will be preserved when an ARRAY or parameters is passed; you never know for a HASH.

DETAILS

Comparison with other implementations

Compared to WebService::Solr

WebService::Solr is a good module, with a lot of miles. The main differences is that Apache::Solr has much more abstraction.

  • simplified parameter syntax, improving readibility

  • real Perl-level boolean parameters, not 'true' and 'false'

  • warnings for deprecated and ignored parameters

  • smart result object with built-in trace and timing

  • hidden paging of results

  • flexible logging framework (Log::Report)

  • both-way XML or both-way JSON, not requests in XML and answers in JSON

  • access to plugings like terms and tika

  • no Moose

SEE ALSO

This module is part of Apache-Solr distribution version 1.09, built on December 06, 2022. Website: http://perl.overmeer.net/CPAN/

LICENSE

Copyrights 2012-2022 by [Mark Overmeer]. For other contributors see ChangeLog.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See http://dev.perl.org/licenses/

1 POD Error

The following errors were encountered while parsing the POD:

Around line 45:

Unterminated F<...> sequence