--watch option

SYNOPSIS: touch timestamp.file treex --watch=timestamp.file my.scen & # or without & and open another terminal # after all documents are processed, treex is still running, watching timestamp.file # you can modify any modules/blocks and then touch timestamp.file # All modified modules will be reloaded (the number of reloaded modules is printed). # The document reader is restarted, so it starts reading the first file again. # To exit this "watching loop" either rm timestamp.file or press Ctrl^C.

BENEFITS: * much faster development cycles (e.g. most time of en-cs translation is spent on loading) * Now I have some non-deterministic problems with loading NER::Stanford - using --watch I get it loaded on all jobs once and then I don't have to reload it.

TODO: * modules are just reloaded, no constructors are called yet

NAME

Treex::Core::Run + treex - applying Treex blocks and/or scenarios on data

VERSION

version 2.20160629

SYNOPSIS

In bash:

> treex myscenario.scen -- data/*.treex
> treex My::Block1 My::Block2 -- data/*.treex

In Perl:

use Treex::Core::Run q(treex);
treex([qw(myscenario.scen -- data/*.treex)]);
treex([qw(My::Block1 My::Block2 -- data/*.treex)]);

DESCRIPTION

Treex::Core::Run allows to apply a block, a scenario, or their mixture on a set of data files. It is designed to be used primarily from bash command line, using a thin front-end script called treex. However, the same list of arguments can be passed by an array reference to the function treex() imported from Treex::Core::Run.

Note that this module supports distributed processing (Linux-only!), simply by adding the switch -p. The treex method then creates a Treex::Core::Parallel::Head object, which extends Treex::Core::Run by providing parallel processing functionality.

Then there are two ways to process the data in a parallel fashion. By default, SGE cluster\'s qsub is expected to be available. If you have no cluster but want to make the computation parallelized at least on a multicore machine, add the --local switch.

SUBROUTINES

treex

create new runner and runs scenario given in parameters

USAGE

usage: treex [-?dEehjLmpqSstv] [long options...] scenario [-- treex_files]
scenario is a sequence of blocks or *.scen files
options:
	-h -? --usage --help                Prints this usage information.
	-s --save                           save all documents
	-q --quiet                          Warning, info and debug messages
	                                    are suppressed. Only fatal errors
	                                    are reported.
	--cleanup                           Delete all temporary files.
	-e STR --error_level STR            Possible values: ALL, DEBUG,
	                                    INFO, WARN, FATAL
	-L STR --language STR --lang STR    shortcut for adding
	                                    "Util::SetGlobal language=xy" at
	                                    the beginning of the scenario
	-S STR --selector STR               shortcut for adding
	                                    "Util::SetGlobal selector=xy" at
	                                    the beginning of the scenario
	-t --tokenize                       shortcut for adding
	                                    "Read::Sentences W2A::Tokenize"
	                                    at the beginning of the scenario
	                                    (or W2A::XY::Tokenize if used
	                                    with --lang=xy)
	--watch STR                         re-run when the given file is
	                                    changed TODO better doc
	-d --dump_scenario                  Just dump (print to STDOUT) the
	                                    given scenario and exit.
	--dump_required_files               Just dump (print to STDOUT) files
	                                    required by the given scenario
	                                    and exit.
	--cache STR                         Use cache. Required memory is
	                                    specified in format
	                                    memcached,loading. Numbers are in
	                                    GB.
	-v --version                        Print treex and perl version
	-E STR --forward_error_level STR    messages with this level or
	                                    higher will be forwarded from the
	                                    distributed jobs to the main
	                                    STDERR
	-p --parallel                       Parallelize the task on SGE
	                                    cluster (using qsub).
	-j INT --jobs INT                   Number of jobs for
	                                    parallelization, default 10.
	                                    Requires -p.
	--local                             Run jobs locally (might help with
	                                    multi-core machines). Requires -p.
	--priority INT                      Priority for qsub, an integer in
	                                    the range -1023 to 0 (or 1024 for
	                                    admins), default=-100. Requires
	                                    -p.
	--memory STR -m STR --mem STR       How much memory should be
	                                    allocated for cluster jobs,
	                                    default=2G. Requires -p.
	                                    Translates to "qsub -hard -l
	                                    mem_free=$mem -l h_vmem=2*$mem -l
	                                    act_mem_free=$mem". Use --mem=0
	                                    and --qsub to set your own SGE
	                                    settings (e.g. if act_mem_free is
	                                    not available).
	--name STR                          Prefix of submitted jobs.
	                                    Requires -p. Translates to "qsub
	                                    -N $name-jobname".
	--queue STR                         SGE queue. Translates to "qsub -q
	                                    $queue".
	--qsub STR                          Additional parameters passed to
	                                    qsub. Requires -p. See --priority
	                                    and --mem. You can use e.g.
	                                    --qsub="-q *@p*,*@s*" to use just
	                                    machines p* and s*. Or e.g.
	                                    --qsub="-q *@!(twi*|pan*)" to
	                                    skip twi* and pan* machines.
	--workdir STR                       working directory for temporary
	                                    files in parallelized processing;
	                                    one can create automatic
	                                    directories by using patterns:
	                                    {NNN} is replaced by an ordinal
	                                    number with so many leading zeros
	                                    to have length of the number of
	                                    Ns, {XXXX} is replaced by a
	                                    random string, whose length is
	                                    the same as the number of Xs
	                                    (min. 4). If not specified,
	                                    directories such as
	                                    001-cluster-run, 002-cluster-run
	                                    etc. are created
	--survive                           Continue collecting jobs' outputs
	                                    even if some of them crashed
	                                    (risky, use with care!).
	--jobindex INT                      Not to be used manually. If
	                                    number of jobs is set to J and
	                                    modulo set to M, only I-th files
	                                    fulfilling I mod J == M are
	                                    processed.
	--outdir STR                        Not to be used manually. Dictory
	                                    for collecting standard and error
	                                    outputs in parallelized
	                                    processing.
	--server STR                        Not to be used manually. Used to
	                                    point parallel jobs to the head.

AUTHORS

Zdeněk Žabokrtský <zabokrtsky@ufal.mff.cuni.cz>

Martin Popel <popel@ufal.mff.cuni.cz>

Martin Majliš

Ondřej Dušek <odusek@ufal.mff.cuni.cz>

COPYRIGHT AND LICENSE

Copyright © 2011-2014 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.