NAME
Lingua::PT::PLN - Perl extension for NLP of the Portuguese Language
SYNOPSIS
use Lingua::PT::PLN;
# occurrence counter
%o = oco("file");
oco({num=>1,output=>"outfile"},"file");
$p = accent($phrase); ## mark word accent of all words
$w = syllable($word);
$w = wordaccent($word);
DESCRIPTION
This is a module for Natural Language Processing of the Portuguese.
Because you are processing Portuguese, you must use a correct locale.
Occurrence counting: oco
Counts word occurrence from a string or a set of files. Returns an hash with the information or creates a sorted file with the results.
This function takes optionally as first argument an hash of options where you can specify:
- num => 1
-
means the output should be sorted by ocurrence number;
- alpha => 1
-
mean the output should be sorted lexicographically
- output => "f"
-
means the output will be written to the file "f";
- from => "string"
-
means that next argument (after the option hash) is a string which should be used as input for the function.
- from => "file"
-
means that remaining arguments to the function are filenames which should be used as input for the function. This is the default option.
- encoding => "utf8"
-
To force UTF8 encoding (default latin1)
- ignorexml => 1
-
XML tags are striped.
- ignorecase => 1
-
All words are lower-cased.
- log => 1
-
to obtain logaritmic output. Output values are between 0..log(1000000) or (0..13.85).
log => 20 -- to obtain values between 0 and 20
Examples:
oco({num=>1,output=>"f"}, "f1","f2")
# sort by occurrence
# store output on file "f"
# process files "f1" and "f2"
oco({alpha=>1,output=>"f"}, "f1","f2")
# sort lexicographically
# store output on file "f"
# process files "f1" and "f2"
%oc = oco("f1","f2")
# return a hash with the occurrences
# use "f1" and "f2" as input files
%oc = oco( {from=>"string"},"text in a string")
# use a string as input
# return a hash with the occurrences
syllable
my $sylls = syllable( $word )
Returns the word with the syllables separated by "|"
accent
my $accent = accent( $phrase )
Returns the phrase with the syllables separated by "|" and accents marked with the charater ":".
wordaccent
Retuns the word splited into syllables and with the accent character marked.
compacta
compara
AUTHOR
Projecto Natura (http://natura.di.uminho.pt)
Alberto Simoes (albie@alfarrabio.di.uminho.pt)
José João Almeida (jj@di.uminho.pt)
Paulo Rocha (paulo.rocha@di.uminho.pt)
SEE ALSO
Lingua::PT::PLNbase(3pm), perl(1), cqp(1),
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 391:
Non-ASCII character seen before =encoding in 'José'. Assuming CP1252