NAME

Treex::Tool::Parser::MSTperl::Sentence

VERSION

version 0.11336

DESCRIPTION

Represents a sentence, both parsed an unparsed. Contains an array of nodes which represent the words in the sentence.

The nodes are ordered, their ord is their 1-based position in the sentence. The 0 ord value is reserved for the (technical) sentence root.

FIELDS

id (Int)

An integer id unique for each sentence (in its proper sense, where sentence is a sequence of tokens - i.e. id stays the same for copies of the same sentence).

nodes (ArrayRef[Treex::Tool::Parser::MSTperl::Node])

(A reference to) an array of nodes (Treex::Tool::Parser::MSTperl::Node) of the sentence.

A node represents both a token of the sentence (usually this is a word) and a node in the parse tree of the sentence as well (if the sentence have been parsed).

nodes_with_root (ArrayRef[Treex::Tool::Parser::MSTperl::Node])

Copy of nodes field with a root node (Treex::Tool::Parser::MSTperl::RootNode) added at the beginning. As the root node's ord is 0 by definition, the position of the nodes in this array exactly corresponds to its ord.

edges (Maybe[ArrayRef[Treex::Tool::Parser::MSTperl::Edge]])

If the sentence is parsed (i.e. the nodes know their parents), this field contains (a reference to) an array of all edges (Treex::Tool::Parser::MSTperl::Edge) in the parse tree of the sentence.

This field is set by the sub fill_fields_after_parse.

If the sentence is not parsed, this field is undef.

features (Maybe[ArrayRef[Str]])

If the sentence is parsed, this field contains (a reference to) an array of all features of all edges in the parse tree of the sentence. If some of the features are repeated in the sentence (i.e. they are present in severeal edges or even repeated in one edge), they are repeated here as well, i.e. this is not a set in mathematical sense but a (generally unordered) list.

This field is set by the sub fill_fields_after_parse.

If the sentence is not parsed, this field is undef.

METHODS

Constructor

my $sentence = Treex::Tool::Parser::MSTperl::Sentence->new( id => 12, nodes => [$node1, $node2, $node3, ...]);

Creates a new sentence. The id must be unique (but copies of the same sentence are to share the same id). It is used for edge signature generation ("signature" in Treex::Tool::Parser::MSTperl::Edge) in edge features caching (and therefore does not have to be set if caching is disabled).

The order of the nodes denotes their order in the sentence, starting from the node with ord 1, i.e. the technical root (Treex::Tool::Parser::MSTperl::RootNode) is not to be included as it is generated automatically in the constructor. The ords of the nodes ("ord" in Treex::Tool::Parser::MSTperl::Node) do not have to (and actually shouldn't) be filled in. If they are, they are checked and a warning on STDERR is issued if they do not correspond to the position of the nodes in the array. If they are not, they are filled in automatically during the sentence creation.

Other fields (nodes_with_root, edges and features) should usually not be set. nodes_with_root are set automatically during sentence creation (and any value set to it is discarded). edges and features are to be set only if the sentence is parsed (i.e. the nodes know their parents, see "parent" in Treex::Tool::Parser::MSTperl::Node and "parentOrd" in Treex::Tool::Parser::MSTperl::Node) by calling the fill_fields_after_parse method.

So, if the sentence is already parsed, you should call the fill_fields_after_parse method immediately after creaion of the sentence.

my $unparsed_sentence_copy = $sentence->copy_nonparsed();

Creates a new instance of the same sentence with the same id and with copies of the nodes but without any parsing information (like after calling clear_parse). The nodes are copied by calling "copy_nonparsed" in Treex::Tool::Parser::MSTperl::Node.

Action methods

$sentence->setChildParent(5, 3)

Sets the parent of the node with the first ord to be the node with the second ord - eg. here, the 3rd node is the parent of the 5th node. It only sets the parent and parentOrd fields in the child node (i.e. it does not create or modify any edges).

When all nodes' parents have been set, fill_fields_after_parse can be called.

$sentence->fill_fields_after_parse()

Fills the fields of the sentence and fields of its nodes which can be filled only for a sentence that has already been parsed (i.e. if the nodes' parent or parentOrd fields are filled).

The fields which are filled by this subroutine are edges and features for the sentence and parent or parentOrd for each of the sentence nodes which do not have the field set.

$sentence->clear_parse()

Is kind of an inversion of the fill_fields_after_parse method. It clears the edges and features fields and also unsets the parents of all nodes (by setting their parent field to undef and parentOrd to 0).

Information methods

$sentence->len()

Returns length of the sentence, i.e. number of nodes in the sentence. Each node corresponds to one word (one token to be more precise).

$sentence->count_errors_attachement($correct_sentence)

Compares the parse tree of the sentence with its correct parse tree, represented by an instance of the same sentence containing its correct parse.

An error is considered to be an incorrectly assigned governing node. So, the parents of all nodes (obviously not including the root node) are compared and if they are different, it is counted as an error. This leads to a minimum number of errors equal to 0 and maximum number equal to the length of the sentence.

$sentence->count_errors_labelling($correct_sentence)

Compares the labelling of the sentence with its correct labelling, represented by an instance of the same sentence containing the correct labels.

An error is considered to be an incorrectly assigned label. So, the labels of all edges (technically stored in the child nodes) are compared and if they are different, it is counted as an error. This leads to a minimum number of errors equal to 0 and maximum number equal to the length of the sentence.

$sentence->getNodeByOrd(6)

Returns the node with this ord (it can also be the root node if the ord is 0) or undef if the ord is out of range.

$sentence->toString()

Returns forms of the nodes joined by spaces (i.e. the sentence as a text but with a space between each two adjacent tokens).

$sentence->toParentOrdsArray()

Returns (a reference to) an array of node parent ords, i.e. for the sentence "Tom is big", where "is" is a child of the root node and "Tom" and "big" are children of "is", this method returns [2, 0, 2].

AUTHORS

Rudolf Rosa <rosa@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.