NAME

SWISH::HiLiter - simple interface to SWISH::API and Search::Tools

SYNOPSIS

  use SWISH::API;
  use SWISH::HiLiter;
  use Search::Tools::UTF8;
  
  my $query   = "foo OR bar";
  my $swish   = SWISH::API->new( 'my_index' );
  my $hiliter = SWISH::HiLiter->new( 
    swish => $swish, 
    query => $query,
  );
     
  my $results = $swish->query( $query );
  
  while ( my $result = $results->next_result ) {
	
	my $path 	= $result->Property( "swishdocpath" );
	my $title 	= $hiliter->light(
				to_utf8( $result->Property( "swishtitle" ) )
			  );
	my $snip 	= $hiliter->light(
			    $hiliter->snip(
				to_utf8( $result->Property( "swishdescription" ) )
			    )
			  );
	my $rank 	= $result->Property( "swishrank" );
	my $file	= $result->Property( "swishreccount" );
       
	print join("\n", $file, $path, $title, $rank, $snip );
	
  }
   

DESCRIPTION

SWISH::HiLiter is a simple interface to Search::Tools. It is designed to work specifically with the SWISH::API module for searching Swish-e indexes and displaying snippets of highlighted text from the stored Swish-e properties.

SWISH::HiLiter is NOT a drop-in replacement for the highlighting modules that come with the Swish-e distribution. Instead, it is intended to be used when programming with SWISH::API.

REQUIREMENTS

  • Search::Tools 0.25 or later.

    If you intend to use full-page highlighting, also get the HTML::Parser and its required modules.

  • Perl 5.8.3 or later.

  • SWISH::API 0.04 or later.

METHODS

new()

Create a SWISH::HiLiter object. The new() method requires a hash of parameter key/values. Available parameters include:

swish

A SWISH::API object. Version 0.03 or newer. [ Required ]

query

The query string you want highlighted. [ Required ]

colors

A reference to an array of HTML color names.

occur

How many query matches to display when running snip(). See also Search::Tools::Snipper.

max_chars

Number of words around match to return in snip(). See also Search::Tools::Snipper.

noshow

Bashful setting. If snip() fails to match any of your query (as can happen if the match is beyond the range of SwishDescription as set in your index), don't show anything. The default is to show the first max_chars of the text.

See the "dump" algorithm in Search::Tools::Snipper.

snipper

A Search::Tools::Snipper object. If you do not provide one, one will be created for you. The snip() method delegates to Search::Tools::Snipper. The snipper() method can get/set the internal Snipper object.

See Search::Tools::Snipper for a description of the different snipping algorithms.

hiliter

A Search::Tools::HiLiter object. If you do not provide one, one will be created for you. The light() method delegates to Search::Tools::HiLiter. The hiliter() method can get/set the internal HiLiter object.

escape

Your text is assumed not to contain HTML markup and so it is HTML-escaped by default. If you have included markup in your text and want it left as-is, set 'escape' to 0. Highlighting should still work, but snip() might break.

init

Called internally.

snip( text )

Returns extracted snippets from text that include terms from the query.

light( text )

Returns highlighted text. See new() for ways to control context, length, etc.

stem( word )

Return the stemmed version of a word. Only works if your first index in SWISH::API object used Fuzzy Mode.

This method is just a wrapper around SWISH::API::Fuzzify.

The stem() method is called internally by the Search::Tools::QueryParser.

NOTE: stem() requires SWISH::API version 0.04 or newer. If you have an older SWISH::API, first consider upgrading (0.03 is very old), and second, set no_stemmer in new() to turn off stemming.

set_query( query )

Set the query in the highlighting object. Called automatically by new() if 'query' is present in the new() call.

You should only call set_query() if you are trying to re-use a SWISH::HiLiter object, as when under a persistent environment like mod_perl or in a loop.

Like query(), return the internal Search::Tools::Query object representing query.

setq( query )

For pre-0.04 compatability, setq() is an alias to set_query().

LIMITATIONS

If your text contains HTML markup and escape = 0, snip() may fail to return valid HTML. I don't consider this a bug, but listing here in case it happens to you.

Stemming and regular expression building considers only the first index's header values from your SWISH::API object. If those header values differ (for example, WordCharacters is defined differently), be aware that only the first index from SWISH::API::IndexNames is used.

REMINDER: Use HTML::HiLiter to highlight full HTML pages; use SWISH::HiLiter to highlight plain text and smaller HTML chunks.

AUTHOR

Peter Karman, karman@cray.com

Thanks to the Swish-e developers, in particular Bill Moseley for graciously sharing time, advice and code examples.

Comments and suggestions are welcome.

COPYRIGHT

###############################################################################
#    CrayDoc 4
#    Copyright (C) 2004 Cray Inc swpubs@cray.com
#
#    This program is free software; you can redistribute it and/or modify
#    it under the terms of the GNU General Public License as published by
#    the Free Software Foundation; either version 2 of the License, or
#    (at your option) any later version.
#
#    This program is distributed in the hope that it will be useful,
#    but WITHOUT ANY WARRANTY; without even the implied warranty of
#    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
#    GNU General Public License for more details.
#
#    You should have received a copy of the GNU General Public License
#    along with this program; if not, write to the Free Software
#    Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
###############################################################################

SEE ALSO

HTML::HiLiter, SWISH::API, Search::Tools