NAME

Treex::Tool::Tagger::MeCab - perl wrapper for C implemented japanese morphological analyzer MeCab

VERSION

version 0.13095

SYNOPSIS

use Treex::Tool::Tagger::MeCab;
my $tagger = Treex::Tool::Tagger::MeCab->new();
my $sentence = qw(わたしは日本語を話します);
my @tokens = $tagger->process_sentence($sentence);

DESCRIPTION

This is a Perl wrapper for MeCab tagger and tokenizer implemented in C++. Generates string of features (first one is wordform) for each token generated. Returns array of tokens for further use.

INSTALLATION

Before installing MeCab, make sure you have properly installed the Treex-Core package (see Treex Installation), since it is prerequisite for this module anyway. After installing Treex-Core you can install MeCab using this Makefile (username "public" passwd "public"). Prior to runing the makefile, you must set the enviromental variable "$TMT_ROOT" to the location of your .treex directory.

You can also install MeCab manually but then you must link the installation directory to the ${TMT_ROOT}/share/installed_tools/tagger/MeCab/ (location within Treex share), otherwise the modules will not be able to use the program.

METHODS

@tokens = $tagger->process_sentence($sentence);

Returns list of "tokens" for the tokenized input with its morphological categories each separated by \t.

SEE ALSO

MeCab Home Page

AUTHOR

Dušan Variš <dvaris@seznam.cz>

COPYRIGHT AND LICENSE

Copyright © 2014 by Institute of Formal and Applied Linguistics, Charles University in Prague

This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.