NAME
Lingua::ZH::MMSEG Mandarin Chinese segmentation
SYNOPSIS
#!/usr/bin/perl
use utf8;
use Lingua::ZH::MMSEG;
my $zh_string="現代漢語的複合動詞可分三個結構語意關係來探討";
my @phrases = mmseg($zh_string);
# use MMSEG algorithm
my @phrases = fmm($zh_string);
# use Forward Maximum Matching algorithm
DESCRIPTION
A problem in computational analysis of Chinese text is that there are no word boundaries in conventionally printed text. Since the word is such a fundamental linguistic unit, it is necessary to identify words in Chinese text so that higher-level analyses can be performed.
Lingua::ZH::MMSEG implements MMSEG original developed by Chih-Hao-Tsai. The whole module is rewritten in pure Perl, and the phrase library is 新酷音 forked from OpenFoundry.
INSTALL
To install this module, just type
cpanm Lingua::ZH::MMSEG
If you don't have cpanm,
curl -LO http://bit.ly/cpanm
chmod +x cpanm
sudo cp cpanm /usr/local/bin
USAGE
Since this module has no dependency at all, you just simply create a new perl script as shown in SYNOPSIS.
FUNCTIONS
mmseg
my @phrases = mmseg($zh_string);
Use MMSEG algorithm to generate segmented chinese phrases.
fmm
my @phrases = fmm($zh_string);
Use forward maximum matching algorithm to generate segmented chinese phrases. It has lower complexity compare to mmseg, but it cannot solve phrase ambiguities.
AUTHOR
Felix Ren-Chyan Chern (dryman) <idryman@gmail.com>