NAME
Lingua::NATools::Dict - Perl extension to encapsulate Dict interface
SYNOPSIS
use Lingua::NATools::Dict;
$dic = Lingua::NATools::Dict::open("file.bin");
$dic->save($filename);
$dic->close;
$dic->add($dic2);
$dic->size();
$dic->exists($id);
$dic->occ($id);
$dic->vals($id);
$dic->for_each( sub{ ... } );
DESCRIPTION
The Dict files (with extension .bin
) created by NATools, are mapping from identifiers of words on one corpus, to identifiers of words on another corpus. Thus, all operations performed by this module uses identifiers instead of words.
You can open the dictionary using
$dic = Lingua::NATools::Dict::open("dic.bin");
Then, all operations are available by methods, in a OO fashion. After using the dictionary, do not forget to close it using
$dic->close().
The add
method receives a dictionary object and adds it with the current contents. Notice that both dictionaries need to be congruent relatively to word identifiers. After adding, do not forget to save, if you with, with
$dic->save("new.dic.bin");
The size
method returns the total number of words on the corpus (the sum of all word occurrences). To get the number of occurrences for a specific word, use the occ
method, passing as parameter the word identifier.
To check if an identifier exists in the dictionary, you can use the exists
method which returns a boolean value.
The vals
method returns an hash of probable translations for the identifier supplied AS A ARRAY REFERENCE. The hash contains as keys the identifiers of the possible translations, and as values their probability of being a translation.
Finally, the for_each
method makes you able to cycle through all word on the dictionary. It receives a funcion reference as argument.
$dic->for_each( sub{ ... } );
Each time the function is called, the following is passed as @_
:
word => $id , occ => $occ , vals => $vals
where $id
is the word identifier, $occ
the result of calling occ
with that word, and $vals
is the result of calling vals
with that word.
SEE ALSO
See perl(1) and NATools documentation.
AUTHOR
Alberto Manuel Brandao Simoes, <albie@alfarrabio.di.uminho.pt>
COPYRIGHT AND LICENSE
Copyright 2002-2012 by NATURA Project http://natura.di.uminho.pt
This library is free software; you can redistribute it and/or modify it under the GNU General Public License 2, which you should find on parent directory. Distribution of this module should be done including all NATools package, with respective copyright notice.