NAME
WordNet::Extend::Locate - Perl modules for locating where in WordNet a lemma should be inserted.
SYNOPSIS
Basic Usage Example
use WordNet::Extend::Locate;
my $locate = WordNet::Extend::Locate->new();
$locate->stopList('(the|is|at)');
$locate->setCleanUp(1);
$locate->preProcessing();
$locate->toggleCompareGlosses(1,1,0);
$locate->setBonus(25);
$locate->toggleRefineSense(0);
print "Finding location for 'dog noun withdef.1 man's best friend'\n";
@location = @{$locate->locate("dog\tnoun\twithdef.1\tman\'s best friend")};
print "Location found: @location\n";
DESCRIPTION
Introduction
WordNet is a widely used tool in NLP and other research areas. A drawback of WordNet is the amount of time between updates. WordNet was last updated and released in December, 2006, and no further updates are planned. WordNet::Extend::Locate aims to help users decide where a good place to insert new lemmas into WordNet is by presenting several different methods to run. Users can then take the suggestion from Locate and use that with WordNet::Extend::Insert or simply use it as a guiding point and choose their own location.
Methods
The following methods are defined in this package:
Public methods
- $obj->new()
-
The constructor for WordNet::Extend::Locate objects.
Parameters: none.
Return value: the new blessed object
- $obj->getError()
-
Allows the object to check if any errors have occurred. Returns an array ($error, $errorString), where $error value equal to 1 represents a warning and 2 represents an error and $errString contains the possible error. For example, if a user forgets to run preProcessing() before a method that relies on it, the error would be 2 and errorString would mention that preProcessing had not been run.
Parameter: None
Returns: array of the form ($error, $errorString).
- $obj->locateFile($input_file, $output_file)
-
Attempts to locate best WordNet position for each word from input file into WordNet, outputs results to output file.
Parameter: location of input file and output file respectively
Returns: nothing
- $obj->locate($wordPosGloss)
-
Takes in single lemma with gloss and returns location of best insertion point in WordNet.
Parameter: Lemma string in format of 'word\tpos\titem-id\tdef' NOTE: String must only be separated by \t no space.
Returns: Array in format of (item-id, WordNet sense, operation)
- $obj->stopList($newStopList)
-
Takes in new stop list, in regex form
Parameter:the new stop list in regex substitution form (w1|w2|...|wn)
Returns: nothing
- $obj->setCleanUp($switch)
-
Allows the user to toggle whether or not glosses should be cleaned up.
Parameter: 0 or 1 to turn clean up off or on respectively
Returns: nothing
- $obj->addCleanUp($cleanUp)
-
Allows the user to add their own regex for cleaning up the glosses.
Parameter: Regex representing the cleanup the user wants performed.
Returns: Nothing
- $obj->preProcessing()
-
Highly increases speed of program by making as many outside calls as possible and storing outside info to be used later.
Parameter: none
Returns: nothing
- $obj->processLemma(@inLemma)
-
Determines where the OOV Lemma should be inserted into WordNet, returns the output.
Parameter: the lemma to be inserted in array form (lemma, part-of-speech, item-id, definition, def source)
Returns: chosen lemma in array form (item-id, WordNet sense, operation)
- $obj->toggleCompareGlosses($hype,$hypo,$syns)
-
Toggles which glosses are used in score sense. by default, the sense, the sense's hypernyms' glosses,hyponyms' glosses, and synsets' glosses are turned on. This method allows for toggling of hypes,hypos,synsets, by passing in three parameters, 1 for on and 0 for off. Example: toggleCompareGlosses(0,0,0) toggles all three off.
Parameters: 0 or 1 for toggling hypernyms, hyponyms, and synset comparisons.
Returns: nothing
- $obj->setBonus($bonus)
-
Allows the user to set the bonus that will be used when scoring lemmas that contain the new lemma.
Parameter: the multiplier that should be used in calculating the bonus.
Returns: nothing
sub setBonus() { my $base = 0; if(scalar @_ == 2)#checks if method entered by object. { $base = 1; }
$bonus = $_[$base]; }
- $obj->scoreSense(@inLemma, $compareSense)
-
Serves as a wrapper method to facilitate the main program by directing it to the currently chosen scoring method. By default the average highest scoring method is chosen. This can be changed with setScoreMethod().
Parameters: the in lemma in array form (lemma, part-of-speech, item-id, definition, def source) and the sense that the lemma is being compared to.
Returns: a score of how related the in lemma is to the compareSense.
- $obj->setScoreMethod($scoreMethod)
-
Allows the user to choose which scoring method should be used by default when running the program from the top. Options are: 'baseline' 'BwS' - baseline system with stemming and lemmitization --as more are added they will appear here.
Parameter: the chosen scoring method
Returns: nothing.
- $obj->Similarity(@inLemma, $compareSense)
-
Calculates a score for the passed sense and returns that score.
Parameters: the in lemma in array form (lemma, part-of-speech, item-id, definition, def source) and the sense that the lemma is being compared to.
Returns: a score of how related the im lemma is to the compareSense.
- $obj->BwS(@inLemma, $compareSense)
-
Calculates a score for the passed sense and returns that score. This is a modified baseline() method which adds stemming to the data.
Parameters: the in lemma in array form (lemma, part-of-speech, item-id, definition, def source) and the sense that the lemma is being compared to.
Returns: a score of how related the in lemma is to the compareSense.
- $obj->baseline(@inLemma, $compareSense)
-
Calculates a score for the passed sense then returns that score. This class is a wrapper for the simpleScoreSense() method as it makes sure no stemming or lemmatization is present in the preProcessing().
Parameters: the in lemma in array form (lemma, part-of-speech, item-id, definition, def source) and the sense that the lemma is being compared to.
Returns: a score of how related the in lemma is to the compareSense.
- $obj->word2VecCompare(@inLemma)
-
Calculates a score for the passed sense by using the gensim Word2Vec model trained on Google news vectors.
Parameters: the in lemma in array form (lemma, part-of-speech, item-id, definition, def source) and the sense that the lemma is being compared to.
Returns: a score of how related the in lemma is to the compareSense.
- $obj->setConfidenceValue()
-
Allows the user to set the confidence value for word2vecCompare(). The confidence value is the cutoff for the similarity score. If the similarity score is below the confidence value it will be dropped. This aims to increase accuracy but will reduce recall.
Parameters: the new confidence value, default is set to 0
Returns: Nothing
- $obj->simpleScoreSense(@inLemma, $compareSense)
-
Calculates a score for the passed sense then returns that score. This is the baseline system which was submitted for SemEval16 task 14. This algorithm scores by overlapping words found in the lemma's gloss and also with the lemma's hypernym and hyponyms' glosses.
Parameters: the in lemma in array form (lemma, part-of-speech, item-id, definition, def source) and the sense that the lemma is being compared to.
Returns: a score of how related the in lemma is to the compareSense.
- $obj->getExtendedGloss($compareSense)
-
Calculates the extended gloss based on which glosses are toggled and returns an array
which contains the full glosses.
Parameter: the sense which the extended gloss is based on
Returns: an array which contains the extended gloss
- $obj->toggleRefineSense($toggle)
-
Allows user to toggle refineSense() on/off.
Parameter: 0 or 1 to toggle the refine sense method on or off respectively in the processLemma method.
Returns: nothing
- $obj->refineSense(@inLemma, $highSense)
-
Refines chosen sense, by determing which numbered sense should be chosen.
Parameters: the in lemma in form of (lemma, part-of-speech, item-id, definition, def source) and the sense which currently bests matches the inlemma.
Returns:the new highest scoring sense
3 POD Errors
The following errors were encountered while parsing the POD:
- Around line 127:
You forgot a '=back' before '=head2'
- Around line 133:
=over without closing =back
- Around line 648:
Unknown directive: =ctu