NAME
Lingua::LO::Romanize - Romanization of Lao language
VERSION
Version 0.10
SYNOPSIS
This module romanizes Lao text using the BGN/PCGN standard from 1966 (with some modifications, see below).
use Lingua::LO::Romanize;
my $foo = Lingua::LO::Romanize->new(text => 'ພາສາລາວ');
my $bar = $foo->romanize; # $bar will hold the string 'phasalao'
$bar = $foo->romanize(hyphen => 1); # $bar will hold the string 'pha-sa-lao'
DESCRIPTION
Lingua::LO::Romanize romanizes lao text using the BGN/PCGN standard from 1966 (also know as the 'French style') with some modifications for post-revolutionary spellings (spellings introduced from 1975). One such modification is that Lao words has to be spelled out. For example, 'ສະຫວັນນະເຂດ' will be romanized correctly into 'savannakhét' while the older spelling 'ສວັນນະເຂດ' will not be romanized correctly due to lack of characters.
Furthermore, 'ຯ' will be romanized to '...', Lao numbers will be 'romanized' to Arabic numbers (0,1,2,3 etc.), and 'ໆ' will repeat the previous syllable. Se below for more romanization rules.
Note that all charcters are treated as UTF-8.
Romanization Rules
Consonants and vowels are generally romanized accourding to the following rules:
Consonants
- ກ
-
initial and final position 'k'
- ຂ
-
initial position 'kh'
- ຄ
-
initial position 'kh'
- ງ
-
initial and final position 'ng'
- ຈ
-
initial postion 'ch'
- ສ
-
initial position 's'
- ຊ
-
intial position 'x'
- ຍ,ຽ
-
initial postion 'gn', final postion 'y'. Could also be a vowel. ຽ is not used in initial position
- ດ
-
intitial postion 'd', final postion 't'
- ຕ
-
initial postion 't'
- ຖ
-
initial postition 'th'
- ທ
-
initial postion 'th'
- ນ
-
initial and final position 'n'
- ບ
-
intitial position 'b', final position 'p'
- ປ
-
initial postion 'p'
- ຜ
-
initial postion 'ph'
- ຝ
-
initial postion 'f'
- ພ
-
initial postion 'ph'
- ຟ
-
initial positon 'f'
- ມ
-
initial and final position 'm'
- ຢ
-
initial postion 'y'
- ຣ,ຣ໌
-
initial and final postion 'r'. ຣ໌ is rarely used and only in final position of words for example 'ເບີຣ໌'
- ລ,◌ຼ
-
initial postion 'l'
- ວ
-
initial postion 'v' or 'o', final postion 'o','iou', or 'oua'. ວ can also be a vowel depending on it's position. The character ວ at the beginning of a syllable should be romanized v. As the second character of a combination in initial position, ວ should be romanized o. The character ວ at the end of a syllable should be romanized in the following manner. The syllables ◌ິ ວ and ◌ີ ວ should be romanized iou. The syllable ◌ົ ວ (treated as a vowel) should be romanized oua. Otherwise, at the end of a syllable, ວ should be romanized o.
- ຫ
-
initial postion 'h'. At the beginning of a syllable, the character ຫ unaccompanied by a vowel or tone mark and occurring immediately before ຍ gn, ນ n, ມ m, ຣ r, ລ l, or ວ v should generally not be romanized. Note that the character combinations ຫນ, ຫມ and ຫລ are often written in abbreviated form: ໜ n, ໝ m, and ຫຼ l, respectively. ແຫນ is romanized to hèn and ແໜ romanized to nè.
- ອ
-
initial postion '-'. ອ can also be a vowel. At the beginning of a word, ອ should not be romanized. At the beginning of a syllable within a word, ອ should be romanized by a hyphen.
- ຮ
-
initial positon 'h'
Vowels
'◌' represent any consonant character.
- ◌ະ,◌ັ,◌າ,◌າ
-
a
- ◌ິ,◌ິ,◌ີ,◌ີ
-
i
- ◌ຶ,◌ຶ,◌ື,◌ື
-
u
- ◌ຸ,◌ຸ,◌ູ,◌ູ
-
ou
- ເ◌ະ,ເ◌ັ,ເ◌,ເ◌
-
é
- ແ◌ະ,ແ◌ັ,ແ◌,ແ◌
-
è
- ໂ◌ະ,◌ົ,ໂ◌,ໂ◌
-
ô
- ເ◌າະ,◌ັອ,◌ໍ,◌ອ
-
o
- ◌ົວະ,◌ັວ,◌ົວ,◌ວ
-
oua
- ເ◌ ັຽະ,◌ັຽ,ເ◌ັຽ,◌ຽ
-
ia
- ເ◌ຶອະ,ເ◌ຶອ,ເ◌ືອ,ເ◌ືອ
-
ua
- ເ◌ິະ,ເ◌ິ,ເ◌ີ,ເ◌ື
-
eu
- ໄ◌,ໃ◌
-
ai
- ເ◌ົາ,
-
ao
- ◌ຳ
-
am
Tones
Tonal marks (່້໊໋) are not romanized.
Numbers
The Lao numbers ໐, ໑, ໒, ໓, ໔, ໕, ໖, ໗, ໘, and ໙ are romanized to the Arabic numbers 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9.
Special characters
ໆ is romanized to repeat the previous syllable, for example ແຊວໆ → xèoxèo.
ຯ (the Lao ellipsis) is 'romanized' to '...'
METHODS
new
Creates a new object, a Lao text string is required
my $foo = Lingua::LO::Romanize->new(text => 'ພາສາລາວ');
text
If a string is passed as argument, this string will be used to romanized from.
$foo->text('ເບຍ');
If no arguments as passed, an array reference of Lingua::LO::Romanize::Word from the current text will be returned.
all_words
Will return an array reference of Lingua::LO::Romanize::Word from the current text.
romanize
Returns the current text as a romanized string. If hyphen is true, the syllables will be hyphenated.
my $string = $foo->romanize;
my $string_with_hyphen = $foo->romanize(hyphen => 1);
syllable_array
Returns the current text as an array of hash references. The key 'lao' represents the original syllable and 'romanized' the romanized syllable.
foreach my $syllable ($foo->syllable_array) {
my $lao_syllable = $syllable->{lao};
my $romanized_syllable = $syllable->{romanized};
...
}
AUTHOR
Joakim Lagerqvist, <jokke at cpan.org>
BUGS
Please report any bugs or feature requests to bug-lingua-lo-romanize at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Lingua-LO-Romanize. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Lingua::LO::Romanize
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
COPYRIGHT & LICENSE
Copyright 2009 Joakim Lagerqvist, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.