NAME

Lingua::EN::WSD::CorpusBased - Word Sense Disambiguation using a domain corpus

SYNOPSIS

my $wn = WordNet::QueryData->new;
my $corpus = Lingua::EN::WSD::CorpusBased::Corpus->new('corpus' => $corpusFile,
                         'wnref' => $wn);
                         
my $wsd = Lingua::EN::WSD::CorpusBased->new('wnref' => $wn,
                   'cref' => $corpus)
          
print join(', ',@{$wsd->wsd($term)});

DESCRIPTION

This Module allows a disambiguation of word senses based on a domain corpus. The system works based on the assumption, that in one corpus, only one sense of a word is used. Basically, we count for each sense the number of occurrences of one of its synonyms. The one with the highest number is then the right one.

Corpus

The corpus is managed by an additional module Lingua::EN::WSD::CorpusBased::Corpus. It stores the corpus and allows a fast access to its sentences. You should look into the documentation of the corpus module, since it expects the corpus to be in a preprocessed state.

METHODS

new

Creates a new object. Takes a couple of arguments:

wnref A reference to a WordNet::QueryData object. Obligatory.

cref A reference to a Corpus object. Obligatory.

debug A switch for the debug mode of the object. Optional, default: 0.

stem If you set this switch to 1, the term in question will be lemmatized using the stem module. If set to 0, only the original term will be sent to WordNet. In this case, it is possible that no WordNet entry is found for the term, leading to an empty list returned by the wsd-method. Optional, default: 1.

strict Controls whether the algorithm returns all senses or no sense in cases where they all are weighted equally. This happens especially, if the terms are not mentioned at all in the corpus (in which case I would recommend a larger corpus). Optional, default: 0.

hyponyms Controls whether we use not only synonyms, but also hyponyms. Optional, default 1

hypernyms Controls whether we use not only synonyms, but also hypernyms. Optional, default 1.

wsd

The method for doing the word sense disambiguation. Returns a reference to a list of senses which seem the most probable for the given term. This can be the empty list (depends on your settings for 'strict').

term The term you want to disambiguate. Required.

debug

Returns true, if the object is in debug mode.

Internal Methods

init

Internal method. Prepares the object for a disambiguation run. Is automatically called by the method wsd. Has to be called before any call of sense, because it does some preprocessing. Takes one parameter.

term The term in question. Required.

sense

Internal method. Iterates over all senses of the given term and returns a reference to a list of the best senses.

v

Internal method. Calculates the weight for a synset as sense of the given term.

count

Internal method. Just a wrapper for the appropriate method of the corpus-object. Returns the number of occurrences.

hyponyms

Internal method. Returns a list of hyponyms (synsets) for a given (as argument) word.

hypernyms

Internal method. Returns a list of hypernyms (synsets) for a given (as argument) word.

synonyms

Internal method. Returns a list of synonyms for a word, which is given as a an argument. The returned list contains words, not synsets.

synsets

Internal method. Returns a reference to a list of synsets for the given term. This list includes all possible part of speeches (as long as they are defined in WordNet).

term_replace

Internal method. Returns the term in question after replacing the last word with the second argument. The returned string has underscores instead of spaces.

Internal method. Returns the grammatical head of the term. In case of multi-word expressions, this is the last word of the expression, otherwise it's the word itself.

term

Internal method. Returns the term in question with underscores instead of spaces.

BUGS

None so far. If you find some, please report them to me, reiter@cpan.org.

TODO

  • A lot more useful debug output

  • More detailed documentation

  • Making more methods externally useful, allowing a more flexible use of the module.

COPYRIGHT

Copyright (c) 2006 by Nils Reiter.

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.