NAME
Treex::Core::DocumentReader - interface for all document readers
VERSION
version 2.20210102
DESCRIPTION
Document readers are a Treex concept how to load documents to be processed by Treex. The documents can be stored in files (in various formats) or read from STDIN
or retrieved from a socket etc.
METHODS
To be implemented
These methods must be implemented in classes that consume this role.
- next_document
-
Return next document (Treex::Core::Document).
- number_of_documents
-
Total number of documents that will be produced by this reader. If the number is unknown in advance,
undef
should be returned.
Already implemented
- is_current_document_for_this_job
-
Is the document that was most recently returned by
$self-
next_document()> supposed to be processed by this job? Job indices and document numbers are 1-based, so e.g. forjobs = 5, jobindex = 3
we want to load documents with numbers 3,8,13,18,...jobs = 5, jobindex = 5
we want to load documents with numbers 5,10,15,20,... i.e. those documents where(doc_number-1) % jobs == (jobindex-1)
. - next_document_for_this_job
-
Returns a next document which should be processed by this job. If
jobindex
is set, returns "modulo number of jobs". Seeis_current_document_for_this_job
. - number_of_documents_per_this_job
-
Total number of documents that will be produced by this reader for this job. It's computed based on
number_of_documents
,jobindex
andjobs
. - restart
-
Start reading again from the first document. This implementation just sets the attribute
doc_number
to zero. You can add additional behavior using the Mooseafter 'restart'
construct.
SEE ALSO
Treex::Block::Read::Sentences Treex::Block::Read::Text Treex::Block::Read::Treex
AUTHOR
Martin Popel <popel@ufal.mff.cuni.cz>
COPYRIGHT AND LICENSE
Copyright © 2011 by Institute of Formal and Applied Linguistics, Charles University in Prague
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.