NAME

Lingua::YaTeA::TermCandidate - Perl extension for Term Candidate

SYNOPSIS

use Lingua::YaTeA::TermCandidate;
Lingua::YaTeA::TermCandidate->new();

DESCRIPTION

This module implements a representation of a term candidate. Each term candidate is described by its identifier (ID), an internal key KEY, the minimal head of the term candidate HEAD, the list of word components WORDS, its list of occurrences OCCURRENCES, the reliability RELIABILITY, is status as term TERM_STATUS (according to the configuration, phrase recognised as term candidate can be a term or not - the default value is 0), the reference to the original phrase in the corpus ORIGINAL_PHRASE, the associated weights that can be considered as relevancy measures WEIGHTS, its root node ROOT, the information whether the term if a maximal noun phrase MNP_STATUS (the default value is 0. a term candidate is considered as maximal noun phrase if at least one occurrence is a maximal noun phrase).

The key of the term candidate is the concatenation of the inflected form, the postag list and the lemma (separated by the character '~').

METHODS

new()

new();

The methord creates a new object of term candidate.

setRoot()

setRoot();

The method sets the ROOT field and returns it.

getRoot()

getRoot();

The method returns the ROOT field.

setTermStatus()

setTermStatus();

The method sets the TERM_STATUS field and returns it.

getTermStatus()

getTermStatus();

The method returns the TERM_STATUS field.

isTerm()

isTerm();

This methods indicates if the term candidate has the term status or not.

getLength()

getLength();

This method returns the number of words composing the phrase.

addWord()

addWord($node, $wordlist);

This method adds a word from the word list $wordlist and referred by the node $node.

addOccurrence()

addOccurrence($term_occurrence);

This method adds the term occurrence ($term_occurrence) to the current term candidate and indicates if it's a maximal noun phrase (field MNP_STATUS).

addOccurrences()

addOccurrences($term_occurrence_list);

This method adds the term occurrences from the list $term_occurrence_list (which is a reference to an array).

getKey()

getKey();

This method returns the key of the term candidate.

getID()

getID();

This method returns the identifier of the term candidate.

getMNPStatus()

getMNPStatus();

This method indicates if the term candidate is maximal noun phrase or not.

editKey()

ediKey($string);

This method allows to modify the key of the current term candidate by adding the string $string.

setHead()

setHead();

This method sets the minimal head of the term candidate by searching it in the parsing tree of the phrase.

getHead()

getHead();

This method returns the minimal head of the term candidate.

setWeight()

setWeight($weight_name, $weight);

This method sets the weight $<weight_name with the weight value $weight.

getWeight()

getWeight($weight_name);

This method returns the weight value of the weight $<weight_name.

setWeights()

setWeights($weight_list);

This method sets a list of weights referred by the hash table weight_list where the key is the weight name and the value is the weight value.

getWeights()

getWeights();

This method returns the list of weights i.e. a hash table where the key is the weight name and the value is the weight value.

getWeightNames()

getWeightNames();

The method returns the list of the weight names that are instanciated, as an array.

getWords()

getWords();

The mathod returns the list of the words that are components of the term candidate.

getword()

getWord($index);

The method returns the word at the position index in the list of the components of the term candidate.

getOccurrences()

getOccurrences();

This method returns the list of the occurrences of the term candidate, as an array reference.

buildLinguisticInfos()

buildLinguisticInfos($tagset);

The method returns the inflected form, the postag list and and the lemma of the term candidate as an array (each informationn is the concatenation of the word information).

getIF()

getIF();

The method returns the inflected form of the term candidate.

getLF()

getLF();

The method returns the canonical form (lemma) of the term candidate.

getPOS()

getPOS();

The method returns the list of the part-of-speech tags of the term candidate.

getFrequency()

getFrequency();

The method returns the frequency of the term candidate, i.e. the number of occurrences of the term candidate.

setReliability()

setReliability($reliability);

The method sets the reliability of the term candidate.

getReliability()

getReliability();

The method returns the reliability of the term candidate.

getOriginalPhrase()

getOriginalPhrase();

The method returns the original phrase issued from the corpus.

SEE ALSO

Sophie Aubin and Thierry Hamon. Improving Term Extraction with Terminological Resources. In Advances in Natural Language Processing (5th International Conference on NLP, FinTAL 2006). pages 380-387. Tapio Salakoski, Filip Ginter, Sampo Pyysalo, Tapio Pahikkala (Eds). August 2006. LNAI 4139.

AUTHOR

Thierry Hamon <thierry.hamon@univ-paris13.fr> and Sophie Aubin <sophie.aubin@lipn.univ-paris13.fr>

COPYRIGHT AND LICENSE

Copyright (C) 2005 by Thierry Hamon and Sophie Aubin

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.6 or, at your option, any later version of Perl 5 you may have available.