NAME
Treex::Tool::Parser::MSTperl::TrainerBase
VERSION
version 0.07298
DESCRIPTION
Trains on correctly parsed sentences and so creates and tunes the model. Uses single-best MIRA (McDonald et al., 2005, Proc. HLT/EMNLP)
Mathematically-looking comments at ends of some lines correspond to the pseudocode description of MIRA provided by McDonald et al.
FIELDS
- config
-
Reference to the instance of Treex::Tool::Parser::MSTperl::Config.
METHODS
The sumUpdateWeight
is a number by which the change of the feature weights is multiplied in the sum of the weights, so that at the end of the algorithm the sum corresponds to its formal definition, which is a sum of all weights after each of the updates. sumUpdateWeight
is a member of a sequence going from N*T to 1, where N is the number of iterations ("number_of_iterations" in Treex::Tool::Parser::MSTperl::FeaturesControl, 10
by default) and T being the number of sentences in training data, N*T thus being the number of inner iterations, i.e. how many times mira_update()
is called.
my ( $features_diff_1, $features_diff_2, $features_diff_count ) = features_diff( $features_1, $features_2 );
Compares features of two parses of a sentence, where the features ($features_1
, $features_2
) are represented as a reference to an array of strings representing the features (the same feature might be present repeatedly, all occurencies of the same feature are summed together).
Features that appear exactly the same times in both parses are disregarded.
The first two returned values ($features_diff_1
, $features_diff_2
) are array references, $features_diff_1
containing features that appear in the first parse ($features_1
) more often than in the second parse ($features_2
), and vice versa for $features_diff_2
. Each feature is contained as many times as is the difference in number of occurencies, eg. if the feature TAG|tag:NN|NN
appears 5 times in the first parse and 8 times in the second parse, then $features_diff_2
will contain 'TAG|tag:NN|NN', 'TAG|tag:NN|NN', 'TAG|tag:NN|NN'
.
The third returned value ($features_diff_count
) is a count of features in which the parses differ, ie. $features_diff_count = scalar(@$features_diff_1) + scalar(@$features_diff_2)
.
update_feature_weight( $model, $feature, $update, $sumUpdateWeight )
Updates weight of $feature
by $update
(which might be positive or negative) and also updates the sum of updates of the feature (which is later used for overtraining avoidance), multiplied by $sumUpdateWeight
, which is simply a count of inner iterations yet to be performed (thus eliminating the need to update the sum on each inner iteration).
AUTHORS
Rudolf Rosa <rosa@ufal.mff.cuni.cz>
COPYRIGHT AND LICENSE
Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 323:
You can't have =items (as at line 338) unless the first thing after the =over is an =item