NAME
Treex::Block::W2A::Tag - universal block for PoS tagging and lemmatization
VERSION
version 2.20151102
SYNOPSIS
# ==== from command line (W2A::Tag in scenario) ====
echo "Hello there" | treex -t \
W2A::Tag module=Treex::Tool::Tagger::Simple::XY lemmatize=1 \
Write::CoNLLX
# ==== creating a derived class ====
package Treex::Block::W2A::XY::TagSimple;
use Moose; use Treex::Core::Common;
use Treex::Tool::Tagger::Simple::XY;
extends 'Treex::Block::W2A::Tag';
# If the tool needs a module, set a default
has model => (is => 'ro', default => 'data/models/tagger/simple/xy.model');
# Override the builder, so $self->tagger is an instance of Treex::Tool::Tagger::Simple::XY
sub _build_tagger{
my ($self) = @_;
# $self->_args is a hashref of all parameters passed to this block (from scenario).
# The tool usually needs just model, but this way it is easy to add new parameters
# (e.g. mem=1g) to the tool without changing this block.
$self->_args->{model} = $self->model;
return Treex::Tool::Tagger::Simple::XY->new($self->_args);
}
1; # add POD and that's all :-)
DESCRIPTION
This class serves two purposes:
- It is a base class for all other PoS tagging blocks.
- It can be used directly in the scenario with specifying the name of the tagger tool in the parameter module
.
Lemmatization
Some taggers do lemmatization together with parsing. Some cannot lemmatize. Some can choose whether to lemmatize or not and in that case the tagger may use less resources. Therefore, this block (and derived classes) has parameter lemmatize
- if set to 0, no lemmas are filled in the trees (even if returned by the tagger tool); - if set to 1, the tagger should either lemmatize all sentences or fail (via log_fatal) during the inicialization, if it does not support lemmatization.
SEE ALSO
COPYRIGHT AND LICENCE
Copyright 2011-2012 Martin Popel
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.