NAME
WordNet::BestStem -- get the best guess stem of a word.
VERSION
0.2.2
SYNOPSIS
my $best = best_stem( 'roses', {V=>1} );
DESCRIPTION
Based on the assumption that the stem has the highest occurence frequency in text corpus. Of course it is not always true, but for certain purposes it may be justifiable to treat the most frequent form as stem.
Find a word's variant forms. Returns the highest frequency (part-of-speech) form according to ICFinder's "information content file", which comes by default with WordNet but can be customized.
ICFinder has frequency count for n and v part-of-speech and not a or r. When a or r is involved, use the number of senses for part-of-speech intead of fre of wp to choose form.
Alternatively, best_stem can use a custom word variant frequency table.
METHODS
best_stem
Returns in list context the best guess stem form, part-of-speech, and frequency; returns in scalar context the stem form.
*Note: WordNet does not at the moment have variant forms for very high frequency words, like "what", "the", "would". best_stem returns empty string in such cases.
Default options (case insensitive):
V => 0, # verbose. for debugging / checking
FRE => undef, # % ref to custom word variant frequency table
Usage:
use WordNet::BestStem qw( best_stem );
print best_stem('misgivings'); # misgiving n 8
print best_stem('roses'); # rose n 5
print best_stem('rose'); # rise v 17
Compared to WordNet::stem,
use WordNet::QueryData;
use WordNet::stem;
$WN = WordNet::QueryData->new();
$stemmer = WordNet::stem->new($WN)
print $stemmer->stemWord('misgivings') # misgiving
print $stemmer->stemWord('roses') # rose
print $stemmer->stemWord('rose') # rose rise
Compared to Lingua::Stem::En,
use Lingua::Stem::En qw( stem );
$stems = stem( { -words => ['misgivings'] } );
print @$stems; # misgiv
$stems = stem( { -words => ['roses'] } );
print @$stems; # rose
$stems = stem( { -words => ['rose'] } );
print @$stems; # rose
deluxe_stems
Uses contextual info, ie appearances of word forms in paragraph/corpus to help choose stem form.
Default options (case insensitive):
V => 0,
FRE => undef, # % ref to custom word variant frequency table
STEM => undef, # % ref to stem_of{string} table per best_stem
Usage:
use WordNet::BestStem qw( deluxe_stems );
my $stemmed_text = deluxe_stems \@text;
or in list context
# ref to @, %, %, %
my ($stemmed, $stem_of, $stem_fre, $str_fre) = deluxe_stems \@paragraph;
For two paragraphs / sentences,
a) beautiful roses i would like a long stem rose
b) he thinks that average salary rose in the last few years
deluxe_stems,
$a_ = deluxe_stems \@a;
print @$a_;
# beautiful rose i would like a long stem rose
# he think that average salary rise in the last few year
Compared to best_stem,
@a_ = map { scalar( best_stem $_ ) || $_ } @a;
print "@a_\n";
# beautiful rose i would like a long stem rise
# he think that average salary rise in the last few year
DEPENDENCIES
WordNet ( http://wordnet.princeton.edu )
WordNet::QueryData
WordNet::Similarity::ICFinder
AUTHOR
~~~~~~~~~~~~ ~~~~~ ~~~~~~~~ ~~~~~ ~~~ `` ><(((">
Copyright (C) 2009 Maggie J. Xiong < maggiexyz users.sourceforge.net >
All rights reserved. There is no warranty. You are allowed to redistribute this software / documentation as Perl itself.