The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

DTA::TokWrap::Processor::mkbx0 - DTA tokenizer wrappers: sxfile -> bx0doc

SYNOPSIS

use DTA::TokWrap::Processor::mkbx0;

$mbx0 = DTA::TokWrap::Processor::mkbx0->new(%opts);
$doc_or_undef = $mbx0->mkbx0($doc);

##-- debugging
$mbx0_or_undef = $mbx0->ensure_stylesheets();
$mbx0->dump_chain_stylesheet($filename_or_fh);
$mbx0->dump_hint_stylesheet($filename_or_fh);
$mbx0->dump_sort_stylesheet($filename_or_fh);

DESCRIPTION

DTA::TokWrap::Processor::mkindex provides an object-oriented DTA::TokWrap::Processor wrapper for hint insertion and serialization sort-key generation on a text-free "structure index" (.sx) XML file.

Most users should use the high-level DTA::TokWrap wrapper class instead of using this module directly.

Constants

@ISA

DTA::TokWrap::Processor::mkbx0 inherits from DTA::TokWrap::Processor.

Constructors etc.

new
$mbx0 = $CLASS_OR_OBJ->new(%opts)

Constructor.

%opts, %$mbx0:

##-- Programs
rmns    => $path_to_xml_rm_namespaces, ##-- default: search
inplace => $bool,                      ##-- prefer in-place programs for search?
auto_xmlid => $bool,                   ##-- if true (default), @id attributes will be mapped to @xml:id
auto_prevnext => $bool,                ##-- if true (default), @prev|@next chains will be auto-sanitized
##
##-- Styleheet: chain-serialization
chain_stylestr  => $stylestr,          ##-- xsl stylesheet string for chain-serialization
chain_styleheet => $stylesheet,        ##-- compiled xsl stylesheet for chain-serialization
##
##-- Styleheet: insert-hints (<seg> elements and their children are handled implicitly)
hint_sb_xpaths => \@xpaths,            ##-- add sentence-break hint (<s/>) for @xpath element open & close
hint_wb_xpaths => \@xpaths,            ##-- ad word-break hint (<w/>) for @xpath element open & close
##
hint_stylestr  => $stylestr,           ##-- xsl stylesheet string
hint_styleheet => $stylesheet,         ##-- compiled xsl stylesheet
##
##-- Stylesheet: mark-sortkeys (<seg> elements and their children are handled implicitly)
sortkey_attr => $attr,                 ##-- sort-key attribute (default: 'dta.tw.key')
sort_ignore_xpaths => \@xpaths,        ##-- ignore these xpaths
sort_addkey_xpaths => \@xpaths,        ##-- add new sort key for @xpaths
##
sort_stylestr  => $stylestr,           ##-- xsl stylesheet string
sort_styleheet => $stylesheet,         ##-- compiled xsl stylesheet
defaults
%defaults = CLASS->defaults();

Static class-dependent defaults.

init
$mbx0 = $mbx0->init();

Dynamic object-dependent defaults.

Methods: XSL stylesheets

ensure_stylesheets
$mbx0_or_undef = $mbx0->ensure_stylesheets();

Ensures that required XSL stylesheets have been compiled.

hint_stylestr
$xsl_str = $mbx0->hint_stylestr();

Returns XSL stylesheet string for the 'insert-hints' transformation, which is responsible for inserting sentence- and token-break hints into the input *.sx document.

sort_stylestr
$xsl_str = $mbx0->sort_stylestr();

Returns XSL stylesheet string for the 'generate-sort-keys' transformation, which is responsible for inserting top-level serialization-segment keys into the input *.sx document.

dump_chain_stylesheet
$mbx0->dump_chain_stylesheet($filename_or_fh);

Dumps the generated 'serialize-chains' stylesheet to $filename_or_fh.

dump_hint_stylesheet
$mbx0->dump_hint_stylesheet($filename_or_fh);

Dumps the generated 'insert-hints' stylesheet to $filename_or_fh.

dump_sort_stylesheet
$mbx0->dump_sort_stylesheet($filename_or_fh);

Dumps the generated 'generate-sortkeys' stylesheet to $filename_or_fh.

Methods: top-level

mkbx0
$doc_or_undef = $CLASS_OR_OBJECT->mkbx0($doc);

Applies the XSL pipeline for hint insertion and sort-key generation to the "structure index" (*.sx) document of the DTA::TokWrap::Document object $doc.

Relevant %$doc keys:

sxfile  => $sxfile,  ##-- (input) structure index filename
bx0doc  => $bx0doc,  ##-- (output) preliminary block-index data (XML::LibXML::Document)
##
mkbx0_stamp0 => $f,  ##-- (output) timestamp of operation begin
mkbx0_stamp  => $f,  ##-- (output) timestamp of operation end
bx0doc_stamp => $f,  ##-- (output) timestamp of operation end

SEE ALSO

DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...

SEE ALSO

DTA::TokWrap::Intro(3pm), dta-tokwrap.perl(1), ...

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2009-2018 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.14.2 or, at your option, any later version of Perl 5 you may have available.