NAME
Treex::Tool::Parser::MSTperl::Sentence
VERSION
version 0.11949
DESCRIPTION
Represents a sentence, both parsed an unparsed. Contains an array of nodes which represent the words in the sentence.
The nodes are ordered, their ord
is their 1-based position in the sentence. The 0 ord
value is reserved for the (technical) sentence root.
FIELDS
- id (Int)
-
An integer id unique for each sentence (in its proper sense, where sentence is a sequence of tokens - i.e.
id
stays the same for copies of the same sentence). - nodes (ArrayRef[Treex::Tool::Parser::MSTperl::Node])
-
(A reference to) an array of nodes (
Treex::Tool::Parser::MSTperl::Node
) of the sentence.A node represents both a token of the sentence (usually this is a word) and a node in the parse tree of the sentence as well (if the sentence have been parsed).
- nodes_with_root (ArrayRef[Treex::Tool::Parser::MSTperl::Node])
-
Copy of
nodes
field with a root node (Treex::Tool::Parser::MSTperl::RootNode) added at the beginning. As the root node'sord
is0
by definition, the position of the nodes in this array exactly corresponds to itsord
. - edges (Maybe[ArrayRef[Treex::Tool::Parser::MSTperl::Edge]])
-
If the sentence is parsed (i.e. the nodes know their parents), this field contains (a reference to) an array of all edges (Treex::Tool::Parser::MSTperl::Edge) in the parse tree of the sentence.
This field is set by the
sub
fill_fields_after_parse
.If the sentence is not parsed, this field is
undef
. - features (Maybe[ArrayRef[Str]])
-
If the sentence is parsed, this field contains (a reference to) an array of all features of all edges in the parse tree of the sentence. If some of the features are repeated in the sentence (i.e. they are present in severeal edges or even repeated in one edge), they are repeated here as well, i.e. this is not a set in mathematical sense but a (generally unordered) list.
This field is set by the
sub
fill_fields_after_parse
.If the sentence is not parsed, this field is
undef
.
METHODS
Constructor
- my $sentence = Treex::Tool::Parser::MSTperl::Sentence->new( id => 12, nodes => [$node1, $node2, $node3, ...]);
-
Creates a new sentence. The
id
must be unique (but copies of the same sentence are to share the same id). It is used for edge signature generation ("signature" in Treex::Tool::Parser::MSTperl::Edge) in edge features caching (and therefore does not have to be set if caching is disabled).The order of the nodes denotes their order in the sentence, starting from the node with
ord
1, i.e. the technical root (Treex::Tool::Parser::MSTperl::RootNode) is not to be included as it is generated automatically in the constructor. Theord
s of the nodes ("ord" in Treex::Tool::Parser::MSTperl::Node) do not have to (and actually shouldn't) be filled in. If they are, they are checked and a warning on STDERR is issued if they do not correspond to the position of the nodes in the array. If they are not, they are filled in automatically during the sentence creation.Other fields (
nodes_with_root
,edges
andfeatures
) should usually not be set.nodes_with_root
are set automatically during sentence creation (and any value set to it is discarded).edges
andfeatures
are to be set only if the sentence is parsed (i.e. the nodes know their parents, see "parent" in Treex::Tool::Parser::MSTperl::Node and "parentOrd" in Treex::Tool::Parser::MSTperl::Node) by calling thefill_fields_after_parse
method.So, if the sentence is already parsed, you should call the
fill_fields_after_parse
method immediately after creaion of the sentence. - my $unparsed_sentence_copy = $sentence->copy_nonparsed();
-
Creates a new instance of the same sentence with the same
id
and with copies of the nodes but without any parsing information (like after callingclear_parse
). The nodes are copied by calling "copy_nonparsed" in Treex::Tool::Parser::MSTperl::Node.
Action methods
- $sentence->setChildParent(5, 3)
-
Sets the parent of the node with the first
ord
to be the node with the secondord
- eg. here, the 3rd node is the parent of the 5th node. It only sets theparent
andparentOrd
fields in the child node (i.e. it does not create or modify any edges).When all nodes' parents have been set,
fill_fields_after_parse
can be called. - $sentence->fill_fields_after_parse()
-
Fills the fields of the sentence and fields of its nodes which can be filled only for a sentence that has already been parsed (i.e. if the nodes'
parent
orparentOrd
fields are filled).The fields which are filled by this subroutine are
edges
andfeatures
for the sentence andparent
orparentOrd
for each of the sentence nodes which do not have the field set. - $sentence->clear_parse()
-
Is kind of an inversion of the
fill_fields_after_parse
method. It clears theedges
andfeatures
fields and also unsets the parents of all nodes (by setting theirparent
field toundef
andparentOrd
to0
).
Information methods
- $sentence->len()
-
Returns length of the sentence, i.e. number of nodes in the sentence. Each node corresponds to one word (one token to be more precise).
- $sentence->count_errors_attachement($correct_sentence)
-
Compares the parse tree of the sentence with its correct parse tree, represented by an instance of the same sentence containing its correct parse.
An error is considered to be an incorrectly assigned governing node. So, the parents of all nodes (obviously not including the root node) are compared and if they are different, it is counted as an error. This leads to a minimum number of errors equal to 0 and maximum number equal to the length of the sentence.
- $sentence->count_errors_labelling($correct_sentence)
-
Compares the labelling of the sentence with its correct labelling, represented by an instance of the same sentence containing the correct labels.
An error is considered to be an incorrectly assigned label. So, the labels of all edges (technically stored in the child nodes) are compared and if they are different, it is counted as an error. This leads to a minimum number of errors equal to 0 and maximum number equal to the length of the sentence.
- $sentence->getNodeByOrd(6)
-
Returns the node with this
ord
(it can also be the root node if theord
is 0) orundef
if theord
is out of range. - $sentence->toString()
-
Returns forms of the nodes joined by spaces (i.e. the sentence as a text but with a space between each two adjacent tokens).
- $sentence->toParentOrdsArray()
-
Returns (a reference to) an array of node parent ords, i.e. for the sentence "Tom is big", where "is" is a child of the root node and "Tom" and "big" are children of "is", this method returns
[2, 0, 2]
.
AUTHORS
Rudolf Rosa <rosa@ufal.mff.cuni.cz>
COPYRIGHT AND LICENSE
Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.