Changes for version 2.021

  • Features:
  • New features for the multi-argument agreement of Basque synthetic verbs:
    • absperson, absnumber, abspoliteness
    • ergperson, ergnumber, ergpoliteness, erggender
    • datperson, datnumber, datpoliteness, datgender
  • Drivers:
  • ET::Puudepank
  • EU::Conll
  • Interface:
  • FeatureStructure has new methods is_masculine(), is_feminine(), is_neuter(), is_common_gender(), is_negative(), is_affirmative(), is_auxiliary(), is_modal(), is_gerund(), is_conditional(), is_cardinal(), is_ordinal(), is_personal_pronoun().
  • New method Atom::merge_atoms() helps create a big atom to decode unnamed features.

Modules

DZ Interset is a universal morphosyntactic feature set to which all tagsets of all corpora/languages can be mapped.
Atomic driver for a surface feature.
Definition of morphosyntactic features and their values.
A temporary envelope that provides access to the old (Interset 1.0) drivers from Interset 2.0.
Atomic driver for a surface feature.
The root class for all physical tagsets covered by DZ Interset 2.0.
Driver for the Arabic tagset of the CoNLL 2006 Shared Task.
Driver for the Arabic tagset of the CoNLL 2007 Shared Task.
Driver for the PADT 2.0 / ElixirFM Arabic positional tagset.
Driver for the Bulgarian tagset of the CoNLL 2006 Shared Task.
Driver for the Bengali tagset of the ICON 2009 and 2010 Shared Tasks, as used in the CoNLL data format.
Driver for the Catalan tagset of the CoNLL 2009 Shared Task.
Driver for the tagset of the Czech morphological analyzers Ajka and Majka (Masaryk University in Brno).
Driver for the tagset of the Czech National Corpus (Cesky narodni korpus).
Driver for the Czech tagset of the CoNLL 2006 and 2007 Shared Tasks.
Driver for the Czech tagset of the CoNLL 2009 Shared Task.
Driver for the Czech tagset of the Multext-EAST project.
Driver for the tagset of the Prague Dependency Treebank.
Driver for the Czech tagset of the Prague Spoken Corpus (Prazsky mluveny korpus).
Driver for the shortened Czech tagset of the Prague Spoken Corpus (Prazsky mluveny korpus).
Common code for drivers of tagsets from files in CoNLL 2006 format.
Driver for the Danish tagset of the CoNLL 2006 Shared Task (derived from the Danish Parole tagset).
Driver for the German tagset of the CoNLL 2006 Shared Task.
Driver for the German tagset of the CoNLL 2009 Shared Task.
Driver for the Stuttgart-Tuebingen Tagset of German.
Driver for the Greek tagset of the CoNLL 2007 Shared Task.
Driver for the English tagset of the CoNLL 2007 Shared Task.
Driver for the English tagset of the CoNLL 2009 Shared Task.
Driver for the tagset of the Penn Treebank.
Driver for the Spanish tagset of the CoNLL 2009 Shared Task.
Driver for the Estonian tagset from the Eesti keele puudepank (Estonian Language Treebank).
Driver for the tagset of the Basque Dependency Treebank in the CoNLL format.
Driver for the Croatian tagset of the Multext-EAST v4 project.
Driver for the IPADIC tagset.
Driver for the Google Universal Part-of-Speech Tagset.
Driver for the Universal Part-of-Speech Tagset, version 2014-10-01, part of Universal Dependencies.
Driver for the Universal Part-of-Speech Tagset + Universal Features, version 2014-10-01, part of Universal Dependencies.
Common code for drivers of tagsets of the Multext-EAST project.
Driver for the Portuguese tagset of the CINTIL corpus (Corpus Internacional do Portugues).
A trie-like structure for DZ Interset features and their values.