Documentation
a gentle introduction to the DTA::TokWrap distribution
table of contents for perl module DTA::TokWrap
top-level tokenizer wrapper for DTA XML documents
add <c> elements to DTA XML documents
splice standoff //s and //w records from .t.xml into original TEI .chr.xml files
convert //c/@xml:id attributes to page-local encoding
file formats used by dta-tokwrap utilities
get DDC-relevant attributes from DTA::TokWrap files
extract a header element from an XML file
make pagebreak index for a DTA source XML file
insert <p> elements to wrap //s/@pn attributes in DTA::TokWrap .t.xml files
make DDC/DTA-friendly TEI-headers
splice generic standoff data into base XML files
DTA::TokWrap: convert .t.xml to enrichted .u.xml
convert DTA::TokWrap ddc-t-xml files to DDC-parseable format
Modules
DTA tokenizer wrappers: top level
DTA tokenizer wrappers: base class
DTA tokenizer wrappers: utilities for binary cx-file I/O
DTA tokenizer wrappers: document wrapper
DTA tokenizer wrappers: document wrapper: make-mode
DTA::Tokwrap logging facility using Log::Log4perl
DTA tokenizer wrappers: base class for processor modules
DTA tokenizer wrappers: (bx0doc,tx) -> bxdata
DTA tokenizer wrappers: sxfile -> bx0doc
DTA tokenizer wrappers: dtatw-mkindex
DTA tokenizer wrappers: t.xml -> (s.xml, w.xml, a.xml) via external filter programs
DTA tokenizer wrappers: t.xml -> (s.xml, w.xml, a.xml) via XSL
DTA tokenizer wrappers: textlt-gttoken alignment for decoded TCF
DTA tokenizer wrappers: TCF-gtTEI+ws decoding via proxy document
DTA tokenizer wrappers: TCF[tei,text,tokens,sentences]-gtTEI,text extraction
DTA tokenizer wrappers: TEI-gtTCF encoding
DTA tokenizer wrappers: TCF text layer tokenization
DTA tokenizer wrappers: t -> t.xml
DTA tokenizer wrappers: t -> t.xml, pure-perl (slow, obsolete)
DTA tokenizer wrappers: tokenizer: default (NYI)
DTA tokenizer wrappers: auto tokenizer
DTA tokenizer wrappers: dtatw-tokenize-dummy
DTA tokenizer wrappers: http: external tokenizer via http (hack)
DTA tokenizer wrappers: tokenizer: dwds_tomsatotath via command-line
DTA tokenizer wrappers: tokenizer: dwds_tomsatotath v0.4.x via command-line
DTA tokenizer wrappers: tokenizer: dwds_tomsatotath v0.5.x via command-line
DTA tokenizer wrappers: tokenizer: moot/waste via command-line
DTA tokenizer wrappers: tokenizer post-processing
Descript: DTA tokenizer wrappers: t.xml -gt t.xml, via idsplice
DTA tokenizer wrappers: generic utilities
DTA tokenizer wrappers: package version constants
Provides
in DTA-TokWrap/TokWrap/Processor/addws.pm
in DTA-TokWrap/TokWrap/Processor/idsplice.pm
in DTA-TokWrap/TokWrap/Processor/tokenize/dwds_scanner.pm
in DTA-TokWrap/TokWrap/Processor/tokenize/dwds_scanner.pm
in DTA-TokWrap/TokWrap/Processor/tokenize/tomasotath_05x.pm
in DTA-TokWrap/TokWrap/Processor/tokenize/tomasotath_05x.pm