NAME

Treex::Tool::Parser::MSTperl::Config

VERSION

version 0.07298

DESCRIPTION

Handles the configuration of the parser.

FIELDS

Fields

field_names (ArrayRef[Str])

Field names (for conversion of field index to field name)

field_names_hash (HashRef[Str])

1 for each field name to easily check if a field name exists

field_indexes (HashRef[Str])

Index of each field name in field_names (for conversion of field name to field index)

Settings

The the config file (usually config.txt) is in YAML format.

Lines beginning with # are comments and are ignored. Lines that contain only whitespace chars or are empty are ignored as well.

Some of the settings are ignored when in parsing mode (i.e. not training). These are use_edge_features_cache (turned off) and number_of_iterations (irrelevant).

These are settings which are acquired from the configuration file (see also its contents, the options are also richly commented there):

Basic Settings

field_names

Lowercase names of fields in the input file (the data fields are to be separated by tabs in the input file). Use [a-z0-9_] only, using always at least one letter. Use unique names, i.e. devise some names even for unused fields.

root_field_values

Field values to set for the (technical) root node.

parent_ord

Name of field containing ord of the parent of the node (also called "head" or "governing node").

number_of_iterations, labeller_number_of_iterations

How many times the trainer (Tagger::MSTperl::Trainer) should go through all the training data (default is 10).

use_edge_features_cache, labeller_use_edge_features_cache

Turns on and off using the edge_features_cache. Default is 0.

Using cache should be turned on (1) if training with a lot of RAM or on small training data, as it uses a lot of memory but speeds up the training greatly (approx. by 30% to 50%). If you need to save RAM, turn it off (0).

Features Settings

features, labeller_features

Features codes to use in the unlabelled/labelled parser. See Treex::Tool::Parser::MSTperl::FeaturesControl for details.

METHODS

Settings

The best source of information about all the possible settings is the configuration file itself (usually called config.txt), as it is richly commented and accompanied by real examples at the same time.

my $config = Treex::Tool::Parser::MSTperl::Config->new(config_file => 'file.config')

Reads the configuration file (in YAML format) and applies the settings.

See file samples/sample.config.

field_name2index ($field_name)

Fields are referred to by names in the config files but by indexes in the code. Therefore this conversion function is necessary; the other direction of the conversion is ensured by the field_names field.

AUTHORS

Rudolf Rosa <rosa@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.