NAME

Lingua::YALI::Identifier - Module for language identification with custom models.

VERSION

version 0.009_03

SYNOPSIS

This modul is generalizatin of Lingua::YALI::LanguageIdentifier and can identify any document class based on used models.

use Lingua::YALI::Builder;
use Lingua::YALI::Identifier;

// create models
my $builder_a = Lingua::YALI::Builder->new(ngrams=>[2]);
$builder_a->train_string("aaaaa aaaa aaa aaa aaa aaaaa aa");
$builder_a->store("model_a.2_all.gz", 2);

my $builder_b = Lingua::YALI::Builder->new(ngrams=>[2]);
$builder_b->train_string("bbbbbb bbbb bbbb bbb bbbb bbbb bbb");
$builder_b->store("model_b.2_all.gz", 2);

// create identifier and load models
my $identifier = Lingua::YALI::Identifier->new();
$identifier->add_class("a", "model_a.2_all.gz");
$identifier->add_class("b", "model_b.2_all.gz");

// identify strings
my $result1 = $identifier->identify_string("aaaaaaaaaaaaaaaaaaa");
print $result1->[0]->[0] . "\t" . $result1->[0]->[1];
// prints out a 1

my $result2 = $identifier->identify_string("bbbbbbbbbbbbbbbbbbb");
print $result2->[0]->[0] . "\t" . $result2->[0]->[1];
// prints out b 1

More examples is presented in Lingua::YALI::Examples.

METHODS

BUILD

Initializes internal variables.

// create identifier
my $identifier = Lingua::YALI::Identifier->new();

add_class

$added = $identifier->add_class($label, $model)

Adds model stored in file $model with label $label and returns whether it was added or not.

print $identifier->add_class("a", "model.a1.gz") . "\n"; 
// prints out 1
print $identifier->add_class("a", "model.a2.gz") . "\n";
// prints out 0 - class a was already added

remove_class

my $removed = $identifier->remove_class($class);

Removes model for label $label.

$identifier->add_class("a", "model.a1.gz");
print $identifier->remove_class("a") . "\n"; 
// prints out 1
print $identifier->remove_class("a") . "\n";
// prints out 0 - class a was already removed     

get_classes

my \@classes = $identifier->get_classes();

Returns all registered classes.

identify_file

my $result = $identifier->identify_file($file)

Identifies class for file $file.

  • It returns undef if $file is undef.

  • It croaks if the file $file does not exist or is not readable.

  • Otherwise look for more details at method "identify_handle".

identify_string

my $result = $identifier->identify_string($string)

Identifies class for string $string.

  • It returns undef if $string is undef.

  • Otherwise look for more details at method "identify_handle".

identify_handle

my $result = $identifier->identify_handle($fh)

Identifies class for file handle $fh and returns:

  • It returns undef if $fh is undef.

  • It croaks if the $fh is not file handle.

  • It returns array reference in format [ ['class1', score1], ['class2', score2], ...] sorted according to score descendently, so the most probable class is the first.

SEE ALSO

AUTHOR

Martin Majlis <martin@majlis.cz>

COPYRIGHT AND LICENSE

This software is Copyright (c) 2012 by Martin Majlis.

This is free software, licensed under:

The (three-clause) BSD License