NAME

Treex::Tool::Parser::MSTperl::TrainerBase

VERSION

version 0.07298

DESCRIPTION

Trains on correctly parsed sentences and so creates and tunes the model. Uses single-best MIRA (McDonald et al., 2005, Proc. HLT/EMNLP)

Mathematically-looking comments at ends of some lines correspond to the pseudocode description of MIRA provided by McDonald et al.

FIELDS

config: Reference to the instance of Treex::Tool::Parser::MSTperl::Config.

METHODS

The sumUpdateWeight is a number by which the change of the feature weights is multiplied in the sum of the weights, so that at the end of the algorithm the sum corresponds to its formal definition, which is a sum of all weights after each of the updates. sumUpdateWeight is a member of a sequence going from N*T to 1, where N is the number of iterations ("number_of_iterations" in Treex::Tool::Parser::MSTperl::FeaturesControl, 10 by default) and T being the number of sentences in training data, N*T thus being the number of inner iterations, i.e. how many times mira_update() is called.

my ( $features_diff_1, $features_diff_2, $features_diff_count ) = features_diff( $features_1, $features_2 );

Compares features of two parses of a sentence, where the features ($features_1, $features_2) are represented as a reference to an array of strings representing the features (the same feature might be present repeatedly, all occurencies of the same feature are summed together).

Features that appear exactly the same times in both parses are disregarded.

The first two returned values ($features_diff_1, $features_diff_2) are array references, $features_diff_1 containing features that appear in the first parse ($features_1) more often than in the second parse ($features_2), and vice versa for $features_diff_2. Each feature is contained as many times as is the difference in number of occurencies, eg. if the feature TAG|tag:NN|NN appears 5 times in the first parse and 8 times in the second parse, then $features_diff_2 will contain 'TAG|tag:NN|NN', 'TAG|tag:NN|NN', 'TAG|tag:NN|NN'.

The third returned value ($features_diff_count) is a count of features in which the parses differ, ie. $features_diff_count = scalar(@$features_diff_1) + scalar(@$features_diff_2).

update_feature_weight( $model, $feature, $update, $sumUpdateWeight )

Updates weight of $feature by $update (which might be positive or negative) and also updates the sum of updates of the feature (which is later used for overtraining avoidance), multiplied by $sumUpdateWeight, which is simply a count of inner iterations yet to be performed (thus eliminating the need to update the sum on each inner iteration).

AUTHORS

Rudolf Rosa <rosa@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 323:: You can't have =items (as at line 338) unless the first thing after the =over is an =item

To install Treex::Tool::Parser::MSTperl::Node, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Treex::Tool::Parser::MSTperl::Node

CPAN shell

perl -MCPAN -e shell
install Treex::Tool::Parser::MSTperl::Node

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)