NAME
Treex::Tool::Parser::MSTperl::TrainerUnlabelled
VERSION
version 0.11336
DESCRIPTION
Trains on correctly parsed sentences and so creates and tunes the model. Uses single-best MIRA (McDonald et al., 2005, Proc. HLT/EMNLP)
FIELDS
- parser
-
Reference to an instance of Treex::Tool::Parser::MSTperl::Parser which is used for the training.
- model
-
Reference to an instance of Treex::Tool::Parser::MSTperl::ModelUnlabelled which is being trained.
METHODS
The sumUpdateWeight
is a number by which the change of the feature weights is multiplied in the sum of the weights, so that at the end of the algorithm the sum corresponds to its formal definition, which is a sum of all weights after each of the updates. sumUpdateWeight
is a member of a sequence going from N*T to 1, where N is the number of iterations ("number_of_iterations" in Treex::Tool::Parser::MSTperl::FeaturesControl, 10
by default) and T being the number of sentences in training data, N*T thus being the number of inner iterations, i.e. how many times mira_update()
is called.
- $trainer->train($training_data);
-
Trains the model, using the settings from
config
and the training data in the form of a reference to an array of parsed sentences (Treex::Tool::Parser::MSTperl::Sentence), which can be obtained by the Treex::Tool::Parser::MSTperl::Reader. - $self->mira_update($sentence_correct_parse, $sentence_best_parse, $sumUpdateWeight)
-
Performs one update of the MIRA (Margin-Infused Relaxed Algorithm) on one sentence from the training data. Its input is the correct parse of the sentence (from the training data) and the best scoring parse created by the parser.
- my ( $features_diff_1, $features_diff_2, $features_diff_count ) = features_diff( $features_1, $features_2 );
-
Compares features of two parses of a sentence, where the features (
$features_1
,$features_2
) are represented as a reference to an array of strings representing the features (the same feature might be present repeatedly, all occurencies of the same feature are summed together).Features that appear exactly the same times in both parses are disregarded.
The first two returned values (
$features_diff_1
,$features_diff_2
) are array references,$features_diff_1
containing features that appear in the first parse ($features_1
) more often than in the second parse ($features_2
), and vice versa for$features_diff_2
. Each feature is contained as many times as is the difference in number of occurencies, eg. if the featureTAG|tag:NN|NN
appears 5 times in the first parse and 8 times in the second parse, then$features_diff_2
will contain'TAG|tag:NN|NN', 'TAG|tag:NN|NN', 'TAG|tag:NN|NN'
.The third returned value (
$features_diff_count
) is a count of features in which the parses differ, ie.$features_diff_count = scalar(@$features_diff_1) + scalar(@$features_diff_2)
. - update_feature_weight( $model, $feature, $update, $sumUpdateWeight )
-
Updates weight of
$feature
by$update
(which might be positive or negative) and also updates the sum of updates of the feature (which is later used for overtraining avoidance), multiplied by$sumUpdateWeight
, which is simply a count of inner iterations yet to be performed (thus eliminating the need to update the sum on each inner iteration).
AUTHORS
Rudolf Rosa <rosa@ufal.mff.cuni.cz>
COPYRIGHT AND LICENSE
Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.