The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Algorithm::AM::Project - Manage data used by Algorithm::AM

VERSION

version 2.36

new

Creates a new Project object. You may optionally pass in the path to the project directory, followed by any named arguments (currently only the required commas parameter is accepted).

A project directory should contain the data set, the test set, and the outcome file (named, not surprisingly, data, test, and outcome). Each line of the data and test files should represent a single exemplar. The required format of each line depends on the value of the commas parameter. commas => 'yes' indicates the following style:

    outcome   ,   v a r i a b l e s   ,   spec

where commas are used to separate the outcome, exemplar variables and spec (or comment), and spaces are used to separate the exemplar variables. commas => 'no' indicates the following style:

    outcome variables spec

where spaces separate the outcome, variables and spec, and the exemplar variables are each a single character (so the above variables would still be v, a, r, etc.).

Any other value for the commas parameter will result in an exception.

The outcome file should have the same number of lines as the data file, and each line should have the outcome of the item on the same line in the data file. The format of the outcome file is like this:

    A V-i
    B a-oi
    C tV-si

where each line contains an outcome in a "short" and then "long" form, separated by whitespace.

If the test or outcome files are missing, the data file will be used. In the case of a missing test file, test items will be taken from the data file and each classified using all of the other items in the data set. If the outcome file is missing, the outcome strings located in the data file will be used for both long and short outcome values.

When this constructor is called, all project files are read and checked for errors. Possible errors in your files include the following:

  • Your project path does not exist or does not contain a data file.

  • The number of variables in each of the items in your test and data files are not all the same.

  • The number of items in your outcome file does not match the number of items in your data file.

  • TODO: A line from your data, test or outcome file could not be parsed.

base_path

Returns the path of the directory containing the project files.

results_path

Returns the path of the file where classification results are to be printed. Currently this is amcpresults inside of the project directory.

num_variables

Returns the number of variables contained in a single exemplar in the project.

num_exemplars

Returns the number of items in the data (training) set.

get_exemplar_data

Returns the data variables for the exemplar at the given index. The return value is an arrayref containing the string value for each variable.

get_exemplar_spec

Returns the spec of the exemplar at the given index.

get_exemplar_outcome

Returns the outcome of the exemplar at the given index.

num_test_items

Returns the number of test items in the project test or data file

get_test_item

Return the test item at the given index. The structure of the return value is [outcome, [data], spec], where [data] contains the varaiable values.

num_outcomes

Returns the number of different outcomes contained in the data.

get_outcome

Returns the "long" outcome string contained at a given index in outcomelist.

var_format

Returns a format string for printing the variables of a data item.

spec_format

Returns a format string for printing a spec string from the data set.

outcome_format

Returns (and/or sets) a format string for printing a "long" outcome.

data_format

Returns the format string for printing the number of data items.

short_outcome_index

Returns the index of the given "short" outcome in outcomelist, or -1 if it is not in the list.

This is obviously not very transparent, as outcomelist is only accessible via a private method. In the future this will be done away with.

add_data

Adds the arguments as a new data exemplar. There are four required arguments: an array ref containing the data variables, the spec, the short outcome string, and the long outcome string.

add_test

Add a test item to the project. The arguments are the same as for c<add_data>.

AUTHOR

Theron Stanford <shixilun@yahoo.com>, Nathan Glenn <garfieldnate@gmail.com>

COPYRIGHT AND LICENSE

This software is copyright (c) 2013 by Royal Skousen.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.