NAME

Algorithm::AM::Batch - Classify items in batch mode

VERSION

version 3.13

`SYNOPSIS`

use Algorithm::AM::Batch;
my $dataset = dataset_from_file(path => 'finnverb', format => 'nocommas');
my $batch = Algorithm::AM::Batch->new(
  training_set => $dataset,
  # print the result of each classification as they are provided
  end_test_hook => sub {
    my ($batch, $test_item, $result) = @_;
    print $test_item->comment . ' ' . $result->result . "\n";
  }
);
my @results = $batch->classify_all($dataset);

`DESCRIPTION`

Batch provides a way to classify entire data sets by repeatedly calling classify with the provided configuration. Hooks are also provided so that the training set and classification parameters can be changed over time. All of the action happens in "classify_all".

EXPORTS

When this module is imported, it also imports the following:

Algorithm::AM
Algorithm::AM::Result
Algorithm::AM::DataSet: Also imports the "dataset_from_file" in Algorithm::AM::DataSet function.
Algorithm::AM::DataSet::Item: Also imports the "new_item" in Algorithm::AM::DataSet::Item function.
Algorithm::AM::BigInt: Also imports the "bigcmp" in Algorithm::AM::BigInt function.

METHODS

`new`

Creates a new object instance. This method takes named parameters which call the methods described in the relevant documentation sections. The only required parameter is "training_set", which should be an instance of Algorithm::AM::DataSet, and which provides a pool of items to be used for training during classification. All of the accepted parameters are listed below:

"training_set"
"repeat"
"probability"
"max_training_items"
"exclude_nulls"
"exclude_given"
"linear"

`training_set`

Returns the dataset used for training.

`test_set`

Returns the test set currently providing the source of items to "classify_all". Before and after classify_all, this returns undef, and so is only useful when called from inside one of the hook subroutines.

`repeat`

Determines how many times each individual test item will be analyzed. As the analogical modeling algorithm is deterministics, it only makes sense to use this if the training set is modifed somehow during each iteration, i.e. via "probability" or "training_item_hook". The default value is 1.

`probability`

Get/set the probabibility that any one training item would be included among the training items used during classification, which is 1 by default.

`max_training_items`

Get/set the maximum number of items considered for addition to the training set. Note that this is the number considered, not actually added, so combined with "probability" or /training_item_hook your training set could be smaller than the amount specified.

`exclude_nulls`

This is passed directly to the new method of Algorithm::AM during each classification in the "classify_all" method.

`exclude_given`

This is passed directly to the new method of Algorithm::AM during each classification in the "classify_all" method.

`linear`

This is passed directly to the new method of Algorithm::AM during each classification in the "classify_all" method.

`classify_all`

Using the analogical modeling algorithm, this method classifies the test items in the project and returns a list of Result objects.

Log::Any is used to log information about the current progress and timing. The statistical summary, analogical set, and gang summary (without items listed) are logged at the info level, and the full gang summary with items listed is logged at the debug level.

Hooks are provided to the user for monitoring or modifying classification configuration. These hooks may be passed into the object constructor or set via one of the accessor methods. Batch classification proceeds as follows:

call begin_hook
loop all test set items
  call begin_test_hook
  repeat X times, where X is specified by the "repeat" setting
    call begin_repeat_hook
    create a training set;
        - for each item in the provided training set,
        up to max_training_items
      exclude the item with probability 1 - probability
      exclude the item if specified via training_item_hook
    classify the item with the given training set
    call end_repeat_hook
  call end_test_hook
call end_hook

The Batch object itself is passed to these hooks, so the user is free to change settings such as "probability" or "max_training_items", or even add training data, at any point. Other information is passed to these hooks as well, as detailed in the method documentation.

`begin_hook`

$batch->begin_hook(sub {
  my ($batch) = @_;
  $batch->probability(.5);
});

This hook is called first thing in the "classify_all" method, and is given the Batch object instance.

`begin_test_hook`

$batch->begin_repeat_hook(sub {
  my ($batch, $test_item) = @_;
  $batch->probability(.5);
  print $test_item->comment . "\n";
});

This hook is called by "classify_all" before any iterations of classification start for each test item. It is provided with the Batch object instance and the test item.

`begin_repeat_hook`

$batch->begin_repeat_hook(sub {
  my ($batch, $test_item, $iteration) = @_;
  $batch->probability(.5);
  print $test_item->comment . "\n";
  print "I'm on iteration $iteration\n";
});

This hook is called during "classify_all" at the beginning of each iteration of classification of a test item. It is provided with the Batch object instance, the test item, and the iteration number, which will vary between 1 and the setting for "repeat".

`training_item_hook`

$batch->begin_repeat_hook(sub {
  my ($batch, $test_item, $iteration, $training_item) = @_;
  $batch->probability(.5);
  print $test_item->comment . "\n";
  print "I'm on iteration $iteration\n";
  if($training_item->comment eq 'include me!'){
    return 1;
  }else{
    return 0;
  }
});

This hook is called by "classify_all" while populating a training set during each iteration of classification. It is provided with the Batch object instance, the test item, the iteration number, and an item which may be included in the training set. If the return value is true, then the item will be included in the training set; otherwise, it will not.

`end_repeat_hook`

$batch->begin_repeat_hook(sub {
  my ($batch, $test_item, $iteration, $excluded_items, $result) = @_;
  $batch->probability(.5);
  print $test_item->comment . "\n";
  print "I finished iteration $iteration\n";
  print 'I excluded ' . scalar @$excluded_items .
    " items from training\n";
  print ${$result->statistical_summary};
});

This hook is called during "classify_all" at the end of each iteration of classification of a test item. It is provided with the Batch object instance, the test item, the iteration number, an array ref containing training items excluded from the training set, and the result object returned by classify.

`end_test_hook`

$batch->begin_repeat_hook(sub {
  my ($batch, $test_item, @results) = @_;
  $batch->probability(.5);
  print $test_item->comment . "\n";
  my $iterations = @results;
  my $correct = 0;
  for my $result (@result){
    $correct++ if $result->result ne 'incorrect';
  }
  print 'Item ' . $item->comment .
    " correct $correct/$iterations times\n";
});

This hook is called by "classify_all" after all classifications of a single item are finished. It is provided with the Batch object instance as well as a list of the Result objects returned by "classify" in Algorithm::AM during each iteration of classification.

`end_hook`

$batch->end_hook(sub {
  my ($batch, @results) = @_;
  for my $result(@results){
    print ${$result->statistical_summary};
  }
});

This hook is called after all classifications are finished. It is provided with the Batch object instance as well as a list of all of the Result objects returned by "classify" in Algorithm::AM.

AUTHOR

Theron Stanford <shixilun@yahoo.com>, Nathan Glenn <garfieldnate@gmail.com>

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install Algorithm::AM, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Algorithm::AM

CPAN shell

perl -MCPAN -e shell
install Algorithm::AM

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

VERSION

SYNOPSIS

DESCRIPTION

EXPORTS

METHODS

new

training_set

test_set

repeat

probability

max_training_items

exclude_nulls

exclude_given

linear

classify_all

begin_hook

begin_test_hook

begin_repeat_hook

training_item_hook

end_repeat_hook

end_test_hook

end_hook

AUTHOR

COPYRIGHT AND LICENSE

Module Install Instructions

`SYNOPSIS`

`DESCRIPTION`

`new`

`training_set`

`test_set`

`repeat`

`probability`

`max_training_items`

`exclude_nulls`

`exclude_given`

`linear`

`classify_all`

`begin_hook`

`begin_test_hook`

`begin_repeat_hook`

`training_item_hook`

`end_repeat_hook`

`end_test_hook`

`end_hook`