NAME
SWISH::HiLiter - simple interface to SWISH::API and Search::Tools
SYNOPSIS
use SWISH::API;
use SWISH::HiLiter;
use Search::Tools::UTF8;
my $query = "foo OR bar";
my $swish = SWISH::API->new( 'my_index' );
my $hiliter = SWISH::HiLiter->new(
swish => $swish,
query => $query,
);
my $results = $swish->query( $query );
while ( my $result = $results->next_result ) {
my $path = $result->Property( "swishdocpath" );
my $title = $hiliter->light(
to_utf8( $result->Property( "swishtitle" ) )
);
my $snip = $hiliter->light(
$hiliter->snip(
to_utf8( $result->Property( "swishdescription" ) )
)
);
my $rank = $result->Property( "swishrank" );
my $file = $result->Property( "swishreccount" );
print join("\n", $file, $path, $title, $rank, $snip );
}
DESCRIPTION
SWISH::HiLiter is a simple interface to Search::Tools. It is designed to work specifically with the SWISH::API module for searching Swish-e indexes and displaying snippets of highlighted text from the stored Swish-e properties.
SWISH::HiLiter is NOT a drop-in replacement for the highlighting modules that come with the Swish-e distribution. Instead, it is intended to be used when programming with SWISH::API.
REQUIREMENTS
Search::Tools 0.25 or later.
If you intend to use full-page highlighting, also get the HTML::Parser and its required modules.
Perl 5.8.3 or later.
SWISH::API 0.04 or later.
METHODS
new()
Create a SWISH::HiLiter object. The new() method requires a hash of parameter key/values. Available parameters include:
- swish
-
A SWISH::API object. Version 0.03 or newer. [ Required ]
- query
-
The query string you want highlighted. [ Required ]
- colors
-
A reference to an array of HTML color names.
- occur
-
How many query matches to display when running snip(). See also Search::Tools::Snipper.
- max_chars
-
Number of words around match to return in snip(). See also Search::Tools::Snipper.
- noshow
-
Bashful setting. If snip() fails to match any of your query (as can happen if the match is beyond the range of SwishDescription as set in your index), don't show anything. The default is to show the first max_chars of the text.
See the "dump" algorithm in Search::Tools::Snipper.
- snipper
-
A Search::Tools::Snipper object. If you do not provide one, one will be created for you. The snip() method delegates to Search::Tools::Snipper. The snipper() method can get/set the internal Snipper object.
See Search::Tools::Snipper for a description of the different snipping algorithms.
- hiliter
-
A Search::Tools::HiLiter object. If you do not provide one, one will be created for you. The light() method delegates to Search::Tools::HiLiter. The hiliter() method can get/set the internal HiLiter object.
- escape
-
Your text is assumed not to contain HTML markup and so it is HTML-escaped by default. If you have included markup in your text and want it left as-is, set 'escape' to 0. Highlighting should still work, but snip() might break.
init
Called internally.
snip( text )
Returns extracted snippets from text that include terms from the query.
light( text )
Returns highlighted text. See new() for ways to control context, length, etc.
stem( word )
Return the stemmed version of a word. Only works if your first index in SWISH::API object used Fuzzy Mode.
This method is just a wrapper around SWISH::API::Fuzzify.
The stem() method is called internally by the Search::Tools::QueryParser.
NOTE: stem() requires SWISH::API version 0.04 or newer. If you have an older SWISH::API, first consider upgrading (0.03 is very old), and second, set no_stemmer
in new() to turn off stemming.
set_query( query )
Set the query in the highlighting object. Called automatically by new() if 'query' is present in the new() call.
You should only call set_query() if you are trying to re-use a SWISH::HiLiter object, as when under a persistent environment like mod_perl or in a loop.
Like query(), return the internal Search::Tools::Query object representing query.
setq( query )
For pre-0.04 compatability, setq() is an alias to set_query().
LIMITATIONS
If your text contains HTML markup and escape = 0, snip() may fail to return valid HTML. I don't consider this a bug, but listing here in case it happens to you.
Stemming and regular expression building considers only the first index's header values from your SWISH::API object. If those header values differ (for example, WordCharacters is defined differently), be aware that only the first index from SWISH::API::IndexNames is used.
REMINDER: Use HTML::HiLiter to highlight full HTML pages; use SWISH::HiLiter to highlight plain text and smaller HTML chunks.
AUTHOR
Peter Karman, karman@cray.com
Thanks to the Swish-e developers, in particular Bill Moseley for graciously sharing time, advice and code examples.
Comments and suggestions are welcome.
COPYRIGHT
###############################################################################
# CrayDoc 4
# Copyright (C) 2004 Cray Inc swpubs@cray.com
#
# This program is free software; you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation; either version 2 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
###############################################################################