NAME

Treex::Block::Read::BaseReader - abstract ancestor for document readers

VERSION

version 0.06513_1

DESCRIPTION

This class serves as an common ancestor for document readers, that have a parameter from with a space or comma separated list of filenames to be loaded. It is designed to implement the Treex::Core::DocumentReader interface.

In derived classes you need to define the next_document method, and you can use next_filename and new_document methods.

ATTRIBUTES

from (required, if filelist is not set)

space or comma separated list of filenames, or - for STDIN (If you use this method via API you can specify filenames instead.)

filelist (required, if from is not set)

path to a file that contains a list of files to be read (one per line)

file_stem (optional)

How to name the loaded documents. This attribute will be saved to the same-named attribute in documents and it will be used in document writers to decide where to save the files.

filenames (internal)

array of filenames to be loaded, automatically initialized from the attribute from

METHODS

next_document

This method must be overriden in derived classes. (The implementation in this class just issues fatal error.)

next_filename

returns the next filename (full path) to be loaded (from the list specified in the attribute from)

new_document($load_from?)

Returns a new empty document with pre-filled attributes loaded_from, file_stem, file_number and path which are guessed based on current_filename.

current_filename

returns the last filename returned by next_filename

is_next_document_for_this_job

Is the document that will be returned by next_document supposed to be processed by this job? This is relevant only in parallel processing, where each job has a different $jobnumber assigned.

number_of_documents

Returns the number of documents that will be read by this reader. If is_one_doc_per_file returns true, then the number of documents equals the number of files given in from. Otherwise, this method returns undef.

SEE

Treex::Block::Read::BaseTextReader Treex::Block::Read::Text

AUTHOR

Martin Popel

COPYRIGHT AND LICENSE

Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 261:

Non-ASCII character seen before =encoding in '©'. Assuming UTF-8