NAME
Lingua::EN::WSD::CorpusBased - Word Sense Disambiguation using a domain corpus
SYNOPSIS
my $wn = WordNet::QueryData->new;
my $corpus = Lingua::EN::WSD::CorpusBased::Corpus->new('corpus' => $corpusFile,
'wnref' => $wn);
my $wsd = Lingua::EN::WSD::CorpusBased->new('wnref' => $wn,
'cref' => $corpus)
print join(', ',@{$wsd->wsd($term)});
DESCRIPTION
This Module allows a disambiguation of word senses based on a domain corpus. The system works based on the assumption, that in one corpus, only one sense of a word is used. Basically, we count for each sense the number of occurrences of one of its synonyms. The one with the highest number is then the right one.
Corpus
The corpus is managed by an additional module Lingua::EN::WSD::CorpusBased::Corpus. It stores the corpus and allows a fast access to its sentences. You should look into the documentation of the corpus module, since it expects the corpus to be in a preprocessed state.
METHODS
- new
-
Creates a new object. Takes a couple of arguments:
wnref A reference to a WordNet::QueryData object. Obligatory.
cref A reference to a Corpus object. Obligatory.
debug A switch for the debug mode of the object. Optional, default: 0.
stem If you set this switch to 1, the term in question will be lemmatized using the stem module. If set to 0, only the original term will be sent to WordNet. In this case, it is possible that no WordNet entry is found for the term, leading to an empty list returned by the wsd-method. Optional, default: 1.
strict Controls whether the algorithm returns all senses or no sense in cases where they all are weighted equally. This happens especially, if the terms are not mentioned at all in the corpus (in which case I would recommend a larger corpus). Optional, default: 0.
hyponyms Controls whether we use not only synonyms, but also hyponyms. Optional, default 1
hypernyms Controls whether we use not only synonyms, but also hypernyms. Optional, default 1.
- wsd
-
The method for doing the word sense disambiguation. Returns a reference to a list of senses which seem the most probable for the given term. This can be the empty list (depends on your settings for 'strict').
term The term you want to disambiguate. Required.
- debug
-
Returns true, if the object is in debug mode.
Internal Methods
- init
-
Internal method. Prepares the object for a disambiguation run. Is automatically called by the method wsd. Has to be called before any call of sense, because it does some preprocessing. Takes one parameter.
term The term in question. Required.
- sense
-
Internal method. Iterates over all senses of the given term and returns a reference to a list of the best senses.
- v
-
Internal method. Calculates the weight for a synset as sense of the given term.
- count
-
Internal method. Just a wrapper for the appropriate method of the corpus-object. Returns the number of occurrences.
- hyponyms
-
Internal method. Returns a list of hyponyms (synsets) for a given (as argument) word.
- hypernyms
-
Internal method. Returns a list of hypernyms (synsets) for a given (as argument) word.
- synonyms
-
Internal method. Returns a list of synonyms for a word, which is given as a an argument. The returned list contains words, not synsets.
- synsets
-
Internal method. Returns a reference to a list of synsets for the given term. This list includes all possible part of speeches (as long as they are defined in WordNet).
- term_replace
-
Internal method. Returns the term in question after replacing the last word with the second argument. The returned string has underscores instead of spaces.
- head
-
Internal method. Returns the grammatical head of the term. In case of multi-word expressions, this is the last word of the expression, otherwise it's the word itself.
- term
-
Internal method. Returns the term in question with underscores instead of spaces.
BUGS
None so far. If you find some, please report them to me, reiter@cpan.org.
TODO
A lot more useful debug output
More detailed documentation
Making more methods externally useful, allowing a more flexible use of the module.
COPYRIGHT
Copyright (c) 2006 by Nils Reiter.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.