NAME
DTA::CAB::Analyzer::Automaton - generic analysis automaton API
SYNOPSIS
use DTA::CAB::Analyzer::Automaton;
##========================================================================
## Constructors etc.
$obj = CLASS_OR_OBJ->new(%args);
$aut = $aut->clear();
##========================================================================
## Methods: Generic
$class = $aut->fstClass();
$class = $aut->labClass();
$bool = $aut->fstOk();
$bool = $aut->labOk();
##========================================================================
## Methods: I/O
$bool = $aut->ensureLoaded();
$aut = $aut->load(fst=>$fstFile, lab=>$labFile);
$aut = $aut->loadFst($fstfile);
$aut = $aut->loadLabels($labfile);
$aut = $aut->parseLabels();
##========================================================================
## Methods: Persistence: Perl
@keys = $class_or_obj->noSaveKeys();
$loadedObj = $CLASS_OR_OBJ->loadPerlRef($ref);
##========================================================================
## Methods: Analysis
$bool = $anl->canAnalyze();
$doc = $anl->analyzeTypes($doc,\%types,\%opts);
DESCRIPTION
Globals
- Variable: @ISA
-
DTA::CAB::Analyzer::Automaton inherits from DTA::CAB::Analyzer.
Constructors etc.
- new
-
$aut = CLASS_OR_OBJ->new(%args);
Constuctor.
%args, %$aut:
##-- Filename Options fstFile => $filename, ##-- source FST file (default: none) labFile => $filename, ##-- source labels file (default: none) ## ##-- Analysis Output analyzeGet => $code, ##-- accessor: coderef or string: source text (default=$DEFAULT_ANALYZE_GET; return undef for no analysis) analyzeSet => $code, ##-- accessor: coderef or string: set analyses (default=$DEFAULT_ANALYZE_SET) wantAnalysisLo => $bool, ##-- set to true to include 'lo' keys in analyses (default: true) wantAnalysisLemma => $bool, ##-- set to true to include 'lemma' keys in analyses (default: false) ## ##-- Analysis Options eow => $sym, ##-- EOW symbol for analysis FST check_symbols => $bool, ##-- check for unknown symbols? (default=1) labenc => $enc, ##-- encoding of labels file (default='auto': utf8 if valid, else latin1) auto_connect => $bool, ##-- whether to call $result->_connect() after every lookup (default=0) tolower => $bool, ##-- if true, all input words will be bashed to lower-case (default=0) tolowerNI => $bool, ##-- if true, all non-initial characters of inputs will be lower-cased (default=0) toupperI => $bool, ##-- if true, initial character will be upper-cased (default=0) bashWS => $str, ##-- if defined, input whitespace will be bashed to '$str' (default='_') attInput => $bool, ##-- if true, respect AT&T lextools-style escapes in input (default=0) attOutput => $bool, ##-- if true, generate AT&T escapes in output (default=1) allowTextRegex => $re, ##-- if defined, only tokens with matching 'text' will be analyzed (default: none) ## : useful: /(?:^[[:alpha:]\-\x{ac}]*[[:alpha:]]+$)|(?:^[[:alpha:]]+[[:alpha:]\-\x{ac}]+$)/ ##-- Analysis objects fst => $gfst, ##-- (child classes only) e.g. a Gfsm::Automaton object (default=new) lab => $lab, ##-- (child classes only) e.g. a Gfsm::Alphabet object (default=new) labh => \%sym2lab, ##-- (?) label hash: $sym2lab{$labSym} = $labId; laba => \@lab2sym, ##-- (?) label array: $lab2sym[$labId] = $labSym; labc => \@chr2lab, ##-- (?)chr-label array: $chr2lab[ord($chr)] = $labId;, by unicode char number (e.g. unpack('U0U*')) ## ##-- INHERITED from DTA::CAB::Analyzer label => $label, ##-- analyzer label (default: from analyzer class name) typeKeys => \@keys, ##-- type-wise keys to expand
- clear
-
$aut = $aut->clear();
Clears the object.
Methods: Generic
- fstClass
-
$class = $aut->fstClass();
Returns default FST class for "loadFst"() method. Used by sub-classes.
- labClass
-
$class = $aut->labClass();
Returns default alphabet class for "loadLabels"() method. Used by sub-classes.
- fstOk
-
$bool = $aut->fstOk();
Should return false iff fst is undefined or "empty".
- labOk
-
$bool = $aut->labOk();
Should return false iff alphabet (label-set) is undefined or "empty".
Methods: I/O
- ensureLoaded
-
$bool = $aut->ensureLoaded();
Ensures automaton data is loaded from default files.
- load
-
$aut = $aut->load(fst=>$fstFile, lab=>$labFile);
Loads specified files.
- loadFst
-
$aut = $aut->loadFst($fstfile);
Loads automaton from $fstfile.
- loadLabels
-
$aut = $aut->loadLabels($labfile);
Loads labels from $labfile.
- parseLabels
-
$aut = $aut->parseLabels();
Parses some information from a (newly loaded) alphabet.
sets up $aut->{labh}, $aut->{laba}, $aut->{labc}
fixes encoding difficulties in $aut->{labh}, $aut->{laba}
Methods: Persistence: Perl
- noSaveKeys
-
@keys = $class_or_obj->noSaveKeys();
Returns list of keys not to be saved
This implementation returns:
qw(dict fst lab laba labc labh result)
- loadPerlRef
-
$loadedObj = $CLASS_OR_OBJ->loadPerlRef($ref);
Implicitly calls $obj->clear()
Methods: Analysis
- canAnalyze
-
$bool = $anl->canAnalyze();
Returns true if analyzer can perform its function (e.g. data is loaded & non-empty) This implementation just returns:
($anl->labOk && $anl->fstOk)
- analyzeTypes
-
$doc = $anl->analyzeTypes($doc,\%types,\%opts);
Perform type-wise analysis of all (text) types in %types (= %{$doc->{types}}).
AUTHOR
Bryan Jurish <moocow@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2009-2019 by Bryan Jurish
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.
SEE ALSO
dta-cab-analyze.perl(1), DTA::CAB::Analyzer(3pm), DTA::CAB::Chain(3pm), DTA::CAB(3pm), perl(1), ...