NAME
Algorithm::AM::Project - Manage data used by Algorithm::AM
VERSION
version 2.35
new
Creates a new Project object. Pass in the path to the project directory followed by any named arguments (currently only the required commas
parameter is accepted).
A project directory should contain the data set, the test set, and the outcome file (named, not surprisingly, data, test, and outcome). Each line of the data and test files should represent a single exemplar. The required format of each line depends on the value of the commas
parameter. commas => 'yes'
indicates the following style:
outcome , v a r i a b l e s , spec
where commas are used to separate the outcome, exemplar variables and spec (or comment), and spaces are used to separate the exemplar variables. commas => 'no'
indicates the following style:
outcome variables spec
where spaces separate the outcome, variables and spec, and the exemplar variables are each a single character (so the above variables would still be v
, a
, r
, etc.).
Any other value for the commas
parameter will result in an exception.
The outcome file should have the same number of lines as the data file, and each line should have the outcome of the item on the same line in the data file. The format of the outcome file is like this:
A V-i
B a-oi
C tV-si
where each line contains an outcome in a "short" and then "long" form, separated by whitespace.
If the test or outcome files are missing, the data file will be used. In the case of a missing test file, test items will be taken from the data file and each classified using all of the other items in the data set. If the outcome file is missing, the outcome strings located in the data file will be used for both long and short outcome values.
When this constructor is called, all project files are read and checked for errors. Possible errors in your files include the following:
Your project path does not exist or does not contain a data file.
The number of variables in each of the items in your test and data files are not all the same.
The number of items in your outcome file does not match the number of items in your data file.
TODO: A line from your data, test or outcome file could not be parsed.
base_path
Returns the path of the directory containing the project files.
results_path
Returns the path of the file where classification results are to be printed. Currently this is amcpresults
inside of the project directory.
num_variables
Returns the number of variables contained in a single exemplar in the project.
num_exemplars
Returns the number of items in the data (training) set.
get_exemplar_data
Returns the data variables for the exemplar at the given index. The return value is an arrayref containing the string value for each variable.
get_exemplar_spec
Returns the spec of the exemplar at the given index.
get_exemplar_outcome
Returns the outcome of the exemplar at the given index.
num_test_items
Returns the number of test items in the project test or data file
get_test_item
Return the test item at the given index. The structure of the return value is [outcome, [data], spec]
, where [data]
contains the varaiable values.
num_outcomes
Returns the number of different outcomes contained in the data.
get_outcome
Returns the "long" outcome string contained at a given index in outcomelist.
var_format
Returns (and/or sets) a format string for printing the variables of a data item.
spec_format
Returns (and/or sets) a format string for printing a spec string from the data set.
outcome_format
Returns (and/or sets) a format string for printing a "long" outcome.
data_format
Returns (and/or sets) the format string for printing the number of data items
short_outcome_index
Returns the index of the given "short" outcome in outcomelist.
This is obviously not very transparent, as outcomelist is only accessible via a private method. In the future this will be done away with.
AUTHOR
Theron Stanford <shixilun@yahoo.com>, Nathan Glenn <garfieldnate@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by Royal Skousen.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.