NAME
Search::FreeText::LexicalAnalysis::Stem - lexicon interface to Lingua::Stem
DESCRIPTION
A filter which uses Lingua::Stem to implement the Porter stemming algorithm. This can then be included in a search system as a part of the indexing and query system.
The filter is wrapped up a bit. This is because Lingua::Stem turns nonwords into absolutely nothing at all. To overcome this, we only stem words, and merge nonwords back in after they have been stemmed.
SYNOPSIS
my $stemmer = new Search::FreeText::LexicalAnalysis::Stem ();
my $words = $lexicaliser->process($oldwords);
METHODS
- $self->initialize();
-
Called when the lexicon system is initialised. This method actually creates and stores the stemmer, and can be overridden if needed.
- $self->process($oldwords);
-
Called to process a reference to an array of words, and returns a reference to an array of stemmed words for further processing. Words that are not stemmable are left in place, which is a slight performance hit as we need to wrap Lingua::Stem, but these are real words for indexing so we mustn't just lose them!
AUTHOR
Stuart Watt <S.N.K.Watt@rgu.ac.uk>
Copyright (c) 2003 The Robert Gordon University. All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.