NAME
Lingua::YALI::Builder - Constructs model for document identification.
VERSION
version 0.006
SYNOPSIS
This modul creates models for Lingua::YALI::Identifier.
Creating bigram and trigram models from a string.
use Lingua::YALI::Builder;
my $builder = Lingua::YALI::Builder->new(ngrams=>[2, 3]);
$builder->train_string("aaaaa aaaa aaa aaa aaa aaaaa aa");
$builder->store("model_a.2_4.gz", 2, 4);
$builder->store("model_a.2_all.gz", 2);
$builder->store("model_a.3_all.gz", 3);
$builder->store("model_a.4_all.gz", 4); // croaks
More examples is presented in Lingua::YALI::Examples.
METHODS
BUILD
BUILD()
Constructs Builder
. It also removes duplicities from ngrams
.
my $builder = Lingua::YALI::Builder->new(ngrams=>[2, 3, 4]);
get_ngrams
my \@ngrams = $builder->get_ngrams()
Returns all n-grams that will be used during training.
my $builder = Lingua::YALI::Builder->new(ngrams=>[2, 3, 4, 2, 3]);
my $ngrams = $builder->get_ngrams();
print join(", ", @$ngras) . "\n";
// prints out 2, 3, 4
get_max_ngram
my $max_ngram = $builder->get_max_ngram()
Returns the highest n-gram size that will be used during training.
my $builder = Lingua::YALI::Builder->new(ngrams=>[2, 3, 4]);
print $builder->get_max_ngram() . "\n";
// prints out 4
train_file
my $used_bytes = $builder->train_file($file)
Trains classifier on file $file
and returns the amount of bytes used for trainig.
It returns undef if
$file
is undef.It croaks if the file
$file
does not exist or is not readable.It returns the amount of bytes used for trainig otherwise.
For more details look at method "train_handle".
train_string
my $used_bytes = $builder->train_string($string)
Trains classifier on string $string
and returns the amount of bytes used for trainig.
It returns undef if
$string
is undef.It returns the amount of bytes used for trainig otherwise.
For more details look at method "train_handle".
train_handle
my $used_bytes = $builder->train_handle($fh)
Trains classifier on file handle $fh
and returns the amount of bytes used for trainig.
It returns undef if
$fh
is undef.It croaks if the
$fh
is not file handle.It returns the amount of bytes used for trainig otherwise.
store
my $stored_count = $builder->store($file, $ngram, $count)
Stores trained model with at most $count
$ngram
-grams to file $file
. If count is not specified all $ngram
-grams are stored.
It croaks if incorrect parameters are passed.
It returns the amount of n-grams stored.
SEE ALSO
Identifier for these models is Lingua::YALI::Identifier.
Source codes are available at https://github.com/martin-majlis/YALI.
AUTHOR
Martin Majlis <martin@majlis.cz>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2012 by Martin Majlis.
This is free software, licensed under:
The (three-clause) BSD License