NAME
ME.wrapper.pl - a wrapper around Statistics::MaxEntropy
and Statistics::Candidates
SYNOPSIS
ME.wrapper.pl --help
--debug
--i_events <filename>
--i_candidates <filename>
--i_dump <filename>
--o_events <filename>
--o_candidates <filename>
--o_parameters <filename>
--special <filename>
--o_dump <filename>
--integer
--KL_max_it <integer>
--NEWTON_max_it <integer>
--KL_min <float>
--NEWTON_min <float>
--nr_to_add <integer>
--SAMPLE <integer>
--GIS
--IIS
--MC
--CORPUS
--ENUM
DESCRIPTION
ME.wrapper.pl
is a command-line interface to Statistics::MaxEntropy
and Statistics::Candidates
. The wrapper and its command line options provide an easy-to-use and transparent connection to the MaxEntropy modules. Below we explain the meaning of the options.
COMMAND LINE ARGUMENTS
We explain the command line options, and state the at which moment they are applied or executed. For this we assume the main program of ME.wrapper.pl
to have the following form
prologue();
run();
epilogue();
If both candidates and events are specified, the feature induction algorithm is called. If only events are specified a scaling algorithm is called (GIS by default).
--integer
-
Specifies whether the feature functions should be interpreted as binary or integer functions.
--KL_max_it integer
-
(set in prologue) The maximum number of iterations performed by the scaling algorithms.
--NEWTON_max_it integer
-
(set in prologue) The maximum number of iteration in Newton's method (IIS only).
--KL_min integer
-
(set in prologue) The minimum difference in Kullback-Leibler divergence that a new scale iteration should bring. Otherwise Scaling is stopped.
--NEWTON_min float
-
(set in prologue) The minimum difference between the new x and the old x in Newton's method (IIS only).
--nr_to_add integer
-
(used in run) Passed to the feature induction algorithm (if called). It states the number of candidates that should be added.
--SAMPLE integer
-
(used in run) Passed to the feature induction algorithm (if called). It determines the size of the Monte Carlo sample. Only makes sense if
--MC
is set. --GIS
-
(used in run) Sets the scaling algorithm to to Generalised Iterative Scaling.
--IIS
-
(used in run) Sets the scaling algorithm to Improved Iterative Scaling.
--MC
-
(used in run) Sets the sampling method to Monte Carlo. See also the
--SAMPLE
option. --CORPUS
-
(used in run) Tells the scaling algorithm to consider the event space a good sample (risky: overtraining).
--ENUM
-
(used in run) For scaling the complete event space (all bitvectors) should be enumerated. This is done in memory, so beware!
--help
-
(done in prologue) Exits after showing the name of the program, and the list of command line options.
--debug
-
(set in prologue) Tells the
MaxEntropy
andCandidates
modules to output a lot of text. --i_events filename
-
(done in prologue) The events are read from <filename>.
--i_candidates filename
-
(done in prologue) The candidates are read from <filename>.
--i_dump filename
-
(done in prologue) An event space read from the dump in <filename>. This option overrules
--i_events
option. --o_events filename
-
(done in epilogue) The events (including candidates that were added) are written to
filename
. --o_candidates filename
-
(done in epilogue) The candidates (if present) are written to
filename
. Only candidates that were not added to the event space are written. --o_parameters filename
-
(done in epilogue) The parameters are written to
filename
. --special filename
-
(done in epilogue) The parameters are written to
filename
in a special format I like. --o_dump filename
-
(done in epilogue) The event space is dumped to
filename
. It can be read in again using--i_dump
(the next time you useME.wrapper.pl
).
BUGS
Options --MC
, --CORPUS
, --ENUM
should be put under one argument that has a parameter, for instance --sample_type [corpus, enum, mc]
.
SEE ALSO
perl(1), Statistics::SparseVector(3) Statistics::Candidates(3), Statistics::MaxEntropy(3).
VERSION
Version 0.2.
AUTHOR
COPYRIGHT
ME.wrapper.pl
comes with ABSOLUTELY NO WARRANTY and may be copied only under the terms of the GNU Library General Public License (version 2, or later), which may be found in the distribution.