NAME

Search::FreeText::LexicalAnalysis::Stop - lexicon interface to a stop list

DESCRIPTION

A filter which provides stop list filtering. The stop list is usually predefined, but additional words can be added, and existing words removed, by subclassing and overriding the initialize() method. Note that this stop list filter is case insensitive. This is deliberate, but can be overridden by defining your own subclass if you like. Quite what a case-sensitive stop list might work like I don't really know.

SYNOPSIS

my $stemmer = new Search::FreeText::LexicalAnalysis::Stop ();
my $words = $lexicaliser->process($oldwords);

METHODS

$self->initialize();

Called when the lexicon system is initialised. This method actually creates and stores the stop list, and can be overridden if needed.

$self->process($oldwords);

Called to process a reference to an array of words, and returns a reference to an array of stemmed words for further processing.

$self->get_stop_list()

Called to return a string containing the stop list. The stop list is a string containing the stop list words. It can also include comments as lines beginning with a '#' character. You might want to override this, for example, to pick up the stop list from a file.

The default method will pick up a stop list from the -stoplist parameter to the main Search::FreeText object, if one has been supplied.

You can also override this by adding some extra lines and special words into the stop list, or removing some words, by calling the default method from within a subclass.

AUTHOR

Stuart Watt <S.N.K.Watt@rgu.ac.uk>

Copyright (c) 2003 The Robert Gordon University. All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.