NAME

Treex::Manual::FAQ - Frequently asked questions about Treex

VERSION

version 0.08324

FREQUENTLY ASKED QUESTIONS

General

Is Treex (API, data formats,...) stable?

Is Treex platform independent?

Why should I use Treex and not other NLP frameworks such as GATE?

My application in Treex is slow. What shall I do?

I noticed that Treex require an on-line access to some data repository. Can Treex work off-line?

Why should I use Treex even if it does not support the language I am interested in?

I found a bug in Treex. What should I do?

write a minimal test, write the version....

I'd like to create a commercial application using some of Treex modules. Shall I expect any licensing troubles?

Installation

Using Treex

How can I get parallel corpus to Treex?

Let's say you have English and French corpus, sentence aligned, stored in plain text format, one sentence per line.

Read::AlignedSentences en=sample_en.txt fr=sample_fr.txt

Contributing to Treex

Can I add new attributes to nodes?

For the official attributes (defined in the Treex PML schema), there are accessor methods, so e.g. for lemma:

my $old_lemma = $anode->lemma;
$node->set_lemma('new');

If you want to use your own new attributes, you can use so-called wild attributes:

$node->wild->{name_of_my_new_attribute} = $value;
$value = $node->wild->{name_of_my_new_attribute};

side note

You can also use methods get_attr and set_attr to access whatever attribute you want:

$node->set_attr('my_new_attribute', 'my_value');
print $node->get_attr('my_new_attribute'); # prints "my_value"

However, when saving the document (to *.treex), only attributes which are described in treex PML schema (stored in treex/lib/Treex/Core/share/tred_extension/treex/resources) will be saved (including all wild attributes).

This means that you should not expect that a "non-schema" attribute (except for the wild attributes) saved in one block will be accessible in another block. (It would work with both block in one scenario, but it wouldn't work if saved in one and loaded in another scenario, which is hard to debug.)

Note, that for temporary information you can also use a separate hash variable as an alternative to new attributes:

my %my_new_attribute_of;
$my_new_attribute_of{$node} = 'my_value';

How can a block/tool automatically download some resources?

It is quite usual that a Treex block or a tool (Treex::Tool::...) needs a pre-trained model, database, dictionary etc. These files can be huge (several GiB) so we cannot store them in svn repository and upload to CPAN. We store them in the Treex shared data directory -- either in resources (for treebanks, corpora and other officially published resources) or in models (pre-trained statistical models) subdirectory.

These files will be automatically downloaded from the UFAL server when they are first needed. To achieve this behavior, override the get_required_share_files method in your block, so it returns a list of filenames, e.g.:

my $input_file = 'data/models/tagger/my_tagger/en_penntb.model'

sub get_required_share_files {
    return $input_file;
}

sub BUILD {
    open my $I, "<:encoding(utf-8)", "$ENV{TMT_ROOT}/share/$input_file";
    # now load the model ...
}

If you want to use shared files in a tool:

package Treex::Tools
use Treex::Core::Resource;
my $input_file = 'data/models/tagger/my_tagger/en_penntb.model'

sub BUILD {
  my ($self) = @_;
  Treex::Core::Resource::require_file_from_share($input_file, ref($self));
  # now load the model ...
}

The second parameter of require_file_from_share is a name of the tool that needs the file, it is just for information -- will be printed when downloading the file.

head2 Using ttred

What does the name ttred mean?

TrEd is a tree editor developed by Petr Pajas - see http://ufal.mff.cuni.cz/~pajas/tred. ttred is Treex-modified TrEd capable of showing *.treex files. Actually, it is a just light-weight wrapper which executes tred with a path to the pre-installed Treex extension.

How to change the order of zones (trees) as they are showed in ttred?

Press c and drag&drop the zones in a matrix. (c is a shortcut for a macro, which you can find also in the menu: Macros - all modes - treex_mode - Configuration.) You can choose both horizontal and vertical position, which is handy e.g. for word-aligned corpora, where you usually want to have the aligned trees above each other, so the alignment links are not too long.

AUTHOR

Zdeněk Žabokrtský <zabokrtsky@ufal.mff.cuni.cz>

Martin Popel <popel@ufal.mff.cuni.cz>

David Mareček <marecek@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.