NAME
Lingua::YALI::Identifier - Module for language identification with custom models.
VERSION
version 0.010_04
SYNOPSIS
This modul identify languages with moduls provided by the user. If you want to use pretrained models use Lingua::YALI::LanguageIdentifier.
Models trained on texts from specific domain outperforms the general ones.
use Lingua::YALI::Builder;
use Lingua::YALI::Identifier;
# create models
my $builder_a = Lingua::YALI::Builder->new(ngrams=>[2]);
$builder_a->train_string("aaaaa aaaa aaa aaa aaa aaaaa aa");
$builder_a->store("model_a.2_all.gz", 2);
my $builder_b = Lingua::YALI::Builder->new(ngrams=>[2]);
$builder_b->train_string("bbbbbb bbbb bbbb bbb bbbb bbbb bbb");
$builder_b->store("model_b.2_all.gz", 2);
# create identifier and load models
my $identifier = Lingua::YALI::Identifier->new();
$identifier->add_class("a", "model_a.2_all.gz");
$identifier->add_class("b", "model_b.2_all.gz");
# identify strings
my $result1 = $identifier->identify_string("aaaaaaaaaaaaaaaaaaa");
print $result1->[0]->[0] . "\t" . $result1->[0]->[1];
# prints out a 1
my $result2 = $identifier->identify_string("bbbbbbbbbbbbbbbbbbb");
print $result2->[0]->[0] . "\t" . $result2->[0]->[1];
# prints out b 1
More examples is presented in Lingua::YALI::Examples.
METHODS
BUILD
Initializes internal variables.
# create identifier
my $identifier = Lingua::YALI::Identifier->new();
add_class
$added = $identifier->add_class($class, $model)
Adds model stored in file $model
with class $class
and returns whether it was added or not.
print $identifier->add_class("a", "model.a1.gz") . "\n";
# prints out 1
print $identifier->add_class("a", "model.a2.gz") . "\n";
# prints out 0 - class a was already added
remove_class
my $removed = $identifier->remove_class($class);
Removes model for class $class
.
$identifier->add_class("a", "model.a1.gz");
print $identifier->remove_class("a") . "\n";
# prints out 1
print $identifier->remove_class("a") . "\n";
# prints out 0 - class a was already removed
get_classes
my \@classes = $identifier->get_classes();
Returns all registered classes.
identify_file
my $result = $identifier->identify_file($file)
Identifies class for file $file
.
It returns undef if
$file
is undef.It croaks if the file
$file
does not exist or is not readable.Otherwise look for more details at method "identify_handle".
identify_string
my $result = $identifier->identify_string($string)
Identifies class for string $string
.
It returns undef if
$string
is undef.Otherwise look for more details at method "identify_handle".
identify_handle
my $result = $identifier->identify_handle($fh)
Identifies class for file handle $fh
and returns:
It returns undef if
$fh
is undef.It croaks if the
$fh
is not file handle.It returns array reference in format [ ['class1', score1], ['class2', score2], ...] sorted according to score descendently, so the most probable class is the first.
SEE ALSO
Identifier with pretrained models for language identification is Lingua::YALI::LanguageIdentifier.
Builder for these models is Lingua::YALI::Builder.
There is also command line tool yali-identifier with similar functionality.
Source codes are available at https://github.com/martin-majlis/YALI.
AUTHOR
Martin Majlis <martin@majlis.cz>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2012 by Martin Majlis.
This is free software, licensed under:
The (three-clause) BSD License