The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

DTA::CAB::Analyzer::Morph::SMOR - morphological analysis via Gfsm automata, for SMOR-style transducers (e.g. Zmorge)

SYNOPSIS

use DTA::CAB::Analyzer::Morph::SMOR;

$morph = DTA::CAB::Analyzer::Morph::SMOR->new(%args);
$morph->analyze($tok);

DESCRIPTION

DTA::CAB::Analyzer::Morph::SMOR is a subclass of DTA::CAB::Analyzer::Morph::Helsinki::DE suitable for use with SMOR-style transducers, including zmorge transducers as produced by the SMORLemma grammar.

To produce a GFSM transducer (zmorge.gfst) and vocabulary (zmorge.lab) suitable for use with this module from one of the binary SFST-format transducers available from https://pub.cl.uzh.ch/users/sennrich/zmorge/, do something like the following (in debian at least):

sudo apt-get install sfst unzip wget sed gawk
wget https://pub.cl.uzh.ch/users/sennrich/zmorge/transducers/zmorge-20150315-smor_newlemma.a.zip
unzip zmorge-20150315-smor_newlemma.a.zip
fst-print zmorge-20150315-smor_newlemma.a | sed 's/ /_/g;' > zmorge.tfst
cat zmorge.tfst \
  | awk -F$'\t' '{ if (NF >= 4) { print $3 "\n" $4 } }' \
  | sed 's/^<>$//;' \
  | sort -u \
  | sed 's/^$/<>/;' \
  | awk '{print $1 "\t" NR-1}' \
  > zmorge.lab
 gfsmcompile -z0 -l zmorge.lab zmorge.tfst | gfsminvert -z0 | gfsmarcsort -l -F zmorge.gfst

You can then test the compiled transducer with this module by calling e.g.:

dta-cab-analyze.perl -ac=Morph::SMOR -ao=fstFile=zmorge.gfst -ao=labFile=zmorge.lab -fc=text -w Vermittlungsgespräche

which should produce something like the following output:

Vermittlungsgespräche
	+[morph] Vermittlungsgespräch[_NN]=Vermittl[<~>]ungs[<#>]gespräch[<+NN>][<Neut>][<Acc>][<Pl>] <0>
	+[morph] Vermittlungsgespräch[_NN]=Vermittl[<~>]ungs[<#>]gespräch[<+NN>][<Neut>][<Dat>][<Sg>][<Old>] <0>
	+[morph] Vermittlungsgespräch[_NN]=Vermittl[<~>]ungs[<#>]gespräch[<+NN>][<Neut>][<Gen>][<Pl>] <0>
	+[morph] Vermittlungsgespräch[_NN]=Vermittl[<~>]ungs[<#>]gespräch[<+NN>][<Neut>][<Nom>][<Pl>] <0>
	+[morph] Vermittlungsgespräch[_NN]=Vermittlung[<->]s[<#>]gespräch[<+NN>][<Neut>][<Acc>][<Pl>] <0>
	+[morph] Vermittlungsgespräch[_NN]=Vermittlung[<->]s[<#>]gespräch[<+NN>][<Neut>][<Dat>][<Sg>][<Old>] <0>
	+[morph] Vermittlungsgespräch[_NN]=Vermittlung[<->]s[<#>]gespräch[<+NN>][<Neut>][<Gen>][<Pl>] <0>
	+[morph] Vermittlungsgespräch[_NN]=Vermittlung[<->]s[<#>]gespräch[<+NN>][<Neut>][<Nom>][<Pl>] <0>

AUTHOR

Bryan Jurish <moocow@cpan.org>

COPYRIGHT AND LICENSE

Copyright (C) 2021 by Bryan Jurish

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.24.1 or, at your option, any later version of Perl 5 you may have available.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 105:

Non-ASCII character seen before =encoding in 'Vermittlungsgespräche'. Assuming UTF-8