NAME
Lingua::YALI::LanguageIdentifier - Module for language identification.
VERSION
version 0.010_02
SYNOPSIS
This modul is for language identification and can identify 122 languages.
use Lingua::YALI::LanguageIdentifier;
// create identifier and register languages
my $identifier = Lingua::YALI::LanguageIdentifier->new();
$identifier->add_language("ces", "eng")
// identify string
my $result = $identifier->identify_string("CPAN, the Comprehensive Perl Archive Network, is an archive of modules written in Perl.");
print "The most probable language is " . $result->[0]->[0] . ".\n";
// prints out The most probable language is eng.
More examples is presented in Lingua::YALI::Examples.
METHODS
add_language
my $added_languages = $identifier->add_languages(@languages)
Registres new languages @languages
for identification and returns the amount of newly added languages. Languages are identified by their ISO 639-3 code.
It croaks when unsupported language is used.
print $identifier->add_languages("ces", "deu", "eng") . "\n";
// prints out 3
print $identifier->add_languages("ces", "slk") . "\n";
// prints out 1
remove_language
my $removed_languages = $identifier->remove_languages(@languages)
Remove languages @languages
for identification and returns the amount of removed languages.
It croaks when unsupported language is used.
$identifier->add_languages("ces", "deu", "eng")
print $identifier->remove_languages("ces", "slk") . "\n";
// prints out 1
print $identifier->remove_languages("ces", "slk") . "\n";
// prints out 0
get_languages
my \@languages = $identifier->get_languages();
Returns all registered languages.
get_available_languages
my \@languages = $identifier->get_available_languages();
Returns all available languages. Currently there is 122 languages ("LANGUAGES").
identify_file
my $result = $identifier->identify_file($file)
Identifies language for file $file
.
For more details look at method "identify_file" in Lingua::YALI::Identifier.
identify_string
my $result = $identifier->identify_string($string)
Identifies language for string $string
.
For more details look at method "identify_string" in Lingua::YALI::Identifier.
identify_handle
my $result = $identifier->identify_handle($fh)
Identifies language for handle $fh
.
For more details look at method "identify_handle" in Lingua::YALI::Identifier.
LANGUAGES
More details about supported languages may be found at http://ufal.mff.cuni.cz/~majlis/w2c/download.html.
afr - Afrikaans
als - Tosk Albanian
amh - Amharic
ara - Arabic
arg - Aragonese
arz - Egyptian Arabic
ast - Asturian
aze - Azerbaijani
bcl - Central Bicolano
bel - Belarusian
ben - Bengali
bos - Bosnian
bpy - Bishnupriya
bre - Breton
bug - Buginese
bul - Bulgarian
cat - Catalan
ceb - Cebuano
ces - Czech
chv - Chuvash
cos - Corsican
cym - Welsh
dan - Danish
deu - German
diq - Dimli (individual language)
ell - Modern Greek (1453-)
eng - English
epo - Esperanto
est - Estonian
eus - Basque
fao - Faroese
fas - Persian
fin - Finnish
fra - French
fry - Western Frisian
gan - Gan Chinese
gla - Scottish Gaelic
gle - Irish
glg - Galician
glk - Gilaki
guj - Gujarati
hat - Haitian
hbs - Serbo-Croatian
heb - Hebrew
hif - Fiji Hindi
hin - Hindi
hrv - Croatian
hsb - Upper Sorbian
hun - Hungarian
hye - Armenian
ido - Ido
ina - Interlingua (International Auxiliary Language Association)
ind - Indonesian
isl - Icelandic
ita - Italian
jav - Javanese
jpn - Japanese
kan - Kannada
kat - Georgian
kaz - Kazakh
kor - Korean
kur - Kurdish
lat - Latin
lav - Latvian
lim - Limburgan
lit - Lithuanian
lmo - Lombard
ltz - Luxembourgish
mal - Malayalam
mar - Marathi
mkd - Macedonian
mlg - Malagasy
mon - Mongolian
mri - Maori
msa - Malay (macrolanguage)
mya - Burmese
nap - Neapolitan
nds - Low German
nep - Nepali
new - Newari
nld - Dutch
nno - Norwegian Nynorsk
nor - Norwegian
oci - Occitan (post 1500)
oss - Ossetian
pam - Pampanga
pms - Piemontese
pnb - Western Panjabi
pol - Polish
por - Portuguese
que - Quechua
ron - Romanian
rus - Russian
sah - Yakut
scn - Sicilian
sco - Scots
slk - Slovak
slv - Slovenian
spa - Spanish
sqi - Albanian
srp - Serbian
sun - Sundanese
swa - Swahili (macrolanguage)
swe - Swedish
tam - Tamil
tat - Tatar
tel - Telugu
tgk - Tajik
tgl - Tagalog
tha - Thai
tur - Turkish
ukr - Ukrainian
urd - Urdu
uzb - Uzbek
vec - Venetian
vie - Vietnamese
vol - Volapük
war - Waray (Philippines)
wln - Walloon
yid - Yiddish
yor - Yoruba
zho - Chinese
SEE ALSO
General version for this identifier is Lingua::YALI::Identifier.
Source codes are available at https://github.com/martin-majlis/YALI.
AUTHOR
Martin Majlis <martin@majlis.cz>
COPYRIGHT AND LICENSE
This software is Copyright (c) 2012 by Martin Majlis.
This is free software, licensed under:
The (three-clause) BSD License