NAME
analogize - classify data with AM from the command line
VERSION
version 3.10
SYNOPSIS
analogize --format <format> [--exemplars <file>] [--test <file>] [--project <dir>] [--print <config_info,statistical_summary, analogical_set_summary,gang_summary,gang_detailed>] [--help]
DESCRIPTION
Classify data with analogical modeling from the command line. Required arguments are format and either exemplars or project. You can use old AM::Parallel projects (a directory containing data
and test
files) or specify individual data and test files. By default, only the accuracy of the predicted outcomes is printed. More detail may be printed using the print option.
OPTIONS
- format
-
specify either commas or nocommas format for exemplar and test data files (
=
should be used for "null" variables). See "dataset_from_file" in Algorithm::AM::DataSet for details on the two formats. exemplars
,data
ortrain
-
path to the file containing the examplar/training data
project
-
path to an AM::Parallel-style project (ignores 'outcome' file); this should be a directory containing a file called
data
containing known exemplars andtest
containing test exemplars. If thetest
file does not exist, then a leave-one-out scheme is used for testing using the exemplars in thedata
file. test
-
path to the file containing the test data. If none is specified, performs leave-one-out classification with the exemplar set.
print
-
reports to print, separated by commas (be careful not to add spaces between report names!). For example,
--print analogical_set_summary,gang_summary
would print analogical sets and gang summaries.Available options are:
config_info
-
Describes the configuration used and some simple information about the data, i.e. cardinality, etc.
statistical_summary
-
A statistical summary of the classification results, including all predicted outcomes with their scores and percentages and the total score for all outcomes. Whether the predicted class is correct, incorrect, or a tie is also included, if the test item had a known class.
analogical_set_summary
-
The analogical set, showing all items that contributed to the predicted outcome, along with the amount contributed by each item (score and percentage overall).
gang_summary
-
A summary of the gang effects on the outcome prediction.
gang_detailed
-
Same as
gang_summary
, but also includes lists of exemplars for each gang.
include_given
-
Allow a test item to be included in the data set during classification. If false (default), test items will be removed from the dataset during classification.
include_nulls
-
Treat null variables in a test item as regular variables. If false (default), these variables will be excluded and not considered during classification.
linear
-
Calculate scores using occurrences (linearly) instead of using pointers (quadratically).
help
or?
-
print help message
EXAMPLES
This distribution comes with a sample dataset in the datasets/soybean
directory. Data exemplars are in data
and a single test exemplar is in test
. The files are in the commas
format. The following two commands are equivalent and will analyze the test exemplar and output a summary of gang effects to gang.txt
:
analogize --exemplars datasets/soybean/data --test datasets/soybean/test --format commas --print gang_summary > gang.txt
analogize --project datasets/soybean --format commas --print gang_summary > gang.txt
The resulting files are best viewed in a text editor with word wrap turned off.
AUTHOR
Theron Stanford <shixilun@yahoo.com>, Nathan Glenn <garfieldnate@gmail.com>
COPYRIGHT AND LICENSE
This software is copyright (c) 2013 by Royal Skousen.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.