NAME

Search::FreeText::LexicalAnalysis - basic lexical analyser for the open search system

DESCRIPTION

An open lexical analysis processor, which you can either override by subclassing, or which you can add your own filters to. Each filter is called with a reference to an array of words, and returns a reference to a new array of words. This is the process method, and the base class Search::FreeText::LexicalAnalysisProcess defines the protocol for each step in the pipeline.

SYNOPSIS

 # Selects default filters
 my $lexicalizer = new Search::FreeText::LexicalAnalysis ();
 # Selects named filters only
 my $lexicalizer = new Search::FreeText::LexicalAnalysis 
     (-filters => [ qw(MyLexicalAnalysis::Heuristics
		       Search::FreeText::LexicalAnalysis::Tokenize
		       Search::FreeText::LexicalAnalysis::Stop 
		       Search::FreeText::LexicalAnalysis::Stem) ]);

 my $words = $lexicalizer->process($text);

METHODS

new Search::FreeText::LexicalAnalysis( -search => searchmod [, -filters => FilterList] );

This is the main constructor for a lexicon. The -search parameter passes the search object instance, and is passed in turn to each of the filters, allowing them to look inside the search instance for any additional data if they need to.

You can use the -filters initialisation key to pass a list of classes for filters. By default the set of filters implements stemming, a reasonably complete stop list, and a few heuristics that tighten up the searching. the order of the filters is fairly important, and looks a bit like this:

Heuristics: Pattern-level heuristics that work on whole strings, implemented by default by Search::FreeText::LexicalAnalysis::Heuristics.
Tokenize: Splits a set of strings into an array of words. Implemented by default by Search::FreeText::LexicalAnalysis::Tokenize. Before this, strings represent documents; after this, they represent words, which is why its position in the list of filters is important.
Stop: Pass the array of words through a stop list filter, removing words that are likely to be irrelevant. Implemented by default by Search::FreeText::LexicalAnalysis::Stop.
Stem: Pass the array of words through a stemmer. Implemented by default by Search::FreeText::LexicalAnalysis::Stem, which in turn uses Lingua::Stem.
$self->initialize();: Initializes the lexical analyser, loading any modules that are needed for the list of filters.
$self->process(words...);: Passes the list of words to the filters as a pipeline. The array of words usually starts as a single string containing all the words, and one of the filters (Tokenize) turns this into an array of individual words. This allows some processing before words are split, as well as the usual stemming and stoplisting afterwards.

AUTHOR

Stuart Watt <S.N.K.Watt@rgu.ac.uk>

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 142:: You forgot a '=back' before '=head1'

To install Search::FreeText, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Search::FreeText

CPAN shell

perl -MCPAN -e shell
install Search::FreeText

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

DESCRIPTION

SYNOPSIS

METHODS

AUTHOR

Module Install Instructions