NAME
Treex::Block::Read::BaseAlignedReader - abstract ancestor for parallel-corpora document readers
VERSION
version 2.20151102
SYNOPSIS
# in scenarios
Read::MyAlignedFormat en=english.txt de=german.txt
# Zones can differ also in selectors, any number of zones can be read
Read::MyAlignedFormat en_ref=ref1,ref2 en_moses=mos1,mos2 en_tectomt=tmt1,tmt2
DESCRIPTION
This class serves as a common ancestor for document readers that read more zones at once -- usually parallel sentences in two (or more) languages. The readers take parameters named as the zones and values of the parameters is a space or comma separated list of filenames to be loaded into the given zone. The class is designed to implement the Treex::Core::DocumentReader interface.
In derived classes you need to define the next_document
method, and you can use next_filenames
and new_document
methods.
ATTRIBUTES
- any parameter in a form of a valid zone_label
-
space or comma separated list of filenames, or
-
for STDIN. - file_stem (optional)
-
How to name the loaded documents. This attribute will be saved to the same-named attribute in documents and it will be used in document writers to decide where to save the files.
METHODS
- next_document
-
This method must be overriden in derived classes. (The implementation in this class just issues fatal error.)
- next_filenames
-
Returns a hashref of filenames (full paths) to be loaded. The keys of the hash are zone labels, the values are the filenames.
- new_document($load_from?)
-
Returns a new empty document with pre-filled attributes
loaded_from
,file_stem
,file_number
andpath
which are guessed based oncurrent_filenames
. - current_filenames
-
returns the last filenames returned by
next_filenames
- number_of_documents
-
Returns the number of documents that will be read by this reader.
SEE ALSO
Treex::Block::Read::BaseReader Treex::Block::Read::BaseAlignedTextReader
AUTHOR
Martin Popel
COPYRIGHT AND LICENSE
Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 267:
Non-ASCII character seen before =encoding in '©'. Assuming UTF-8