NAME
Lingua::YALI::Identifier - Module for language identification with custom models.
VERSION
version 0.010
SYNOPSIS
This modul is generalizatin of Lingua::YALI::LanguageIdentifier and can identify any document class based on used models.
use Lingua::YALI::Builder;
use Lingua::YALI::Identifier;
// create models
my $builder_a = Lingua::YALI::Builder->new(ngrams=>[2]);
$builder_a->train_string("aaaaa aaaa aaa aaa aaa aaaaa aa");
$builder_a->store("model_a.2_all.gz", 2);
my $builder_b = Lingua::YALI::Builder->new(ngrams=>[2]);
$builder_b->train_string("bbbbbb bbbb bbbb bbb bbbb bbbb bbb");
$builder_b->store("model_b.2_all.gz", 2);
// create identifier and load models
my $identifier = Lingua::YALI::Identifier->new();
$identifier->add_class("a", "model_a.2_all.gz");
$identifier->add_class("b", "model_b.2_all.gz");
// identify strings
my $result1 = $identifier->identify_string("aaaaaaaaaaaaaaaaaaa");
print $result1->[0]->[0] . "\t" . $result1->[0]->[1];
// prints out a 1
my $result2 = $identifier->identify_string("bbbbbbbbbbbbbbbbbbb");
print $result2->[0]->[0] . "\t" . $result2->[0]->[1];
// prints out b 1
More examples is presented in Lingua::YALI::Examples.
METHODS
BUILD
Initializes internal variables.
// create identifier
my $identifier = Lingua::YALI::Identifier->new();
add_class
$added = $identifier->add_class($label, $model)
Adds model stored in file $model
with label $label
and returns whether it was added or not.
print $identifier->add_class("a", "model.a1.gz") . "\n";
// prints out 1
print $identifier->add_class("a", "model.a2.gz") . "\n";
// prints out 0 - class a was already added
remove_class
my $removed = $identifier->remove_class($class);
Removes model for label $label.
$identifier->add_class("a", "model.a1.gz");
print $identifier->remove_class("a") . "\n";
// prints out 1
print $identifier->remove_class("a") . "\n";
// prints out 0 - class a was already removed
get_classes
my \@classes = $identifier->get_classes();
Returns all registered classes.
identify_file
my $result = $identifier->identify_file($file)
Identifies class for file $file
.
It returns undef if
$file
is undef.It croaks if the file
$file
does not exist or is not readable.Otherwise look for more details at method "identify_handle".
identify_string
my $result = $identifier->identify_string($string)
Identifies class for string $string
.
It returns undef if
$string
is undef.Otherwise look for more details at method "identify_handle".
identify_handle
my $result = $identifier->identify_handle($fh)
Identifies class for file handle $fh
and returns:
It returns undef if
$fh
is undef.It croaks if the
$fh
is not file handle.It returns array reference in format [ ['class1', score1], ['class2', score2], ...] sorted according to score descendently, so the most probable class is the first.
SEE ALSO
Identifier with pretrained models for language identification is Lingua::YALI::LanguageIdentifier.
Builder for these models is Lingua::YALI::Builder.
Source codes are available at https://github.com/martin-majlis/YALI.
AUTHOR
Martin Majlis <martin@majlis.cz>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2012 by Martin Majlis.
This is free software, licensed under:
The (three-clause) BSD License