NAME
Text::Metaphone::Amharic - The Metaphone Algorithm for Amharic.
SYNOPSIS
use utf8;
require Text::Metaphone::Amharic;
my $mphone = new Text::Metaphone::Amharic;
my @keys = $mphone->metaphone ( "ሥላሴ" );
foreach (@keys) {
print "$_\n";
}
my $key = $mphone->metaphone ( "á�€áˆ�á‹" );
print "key => $key\n";
$mphone->style ( "ipa" );
@keys = $mphone->metaphone ( "ሥላሴ" );
foreach (@keys) {
print "$_\n";
}
$mphone->style ( "ethiopic" );
:
:
The key "style" and Metaphone "granularity" can be set at import time:
use Text::Metaphone::Amharic ( style => "ipa", granularity => "high" );
at instantiation time:
my $mphone = new Text::Metaphone::Amharic ( style => "ipa", granularity => "high" );
or anytime there after:
$mphone->style ( "ethiopic" );
$mphone->granularity ( "low" );
DESCRIPTION
The Text::Metaphone::Amharic module is a reimplementation of the Amharic Metaphone algorithm of the Text::TransMetaphone package. This implementation uses an object oriented interface and will generate keys in Ethiopic script by default (see the styles section for other encoding options).
By default the keys are generated in "low" granularity mode which finds the most matches. The granularity section discusses the effects of the different levels.
Like Text::TransMetaphone::am the terminal key returned under list context is a regular expression. Amharic character classes will be applied in the RE key as per the conventions of Regexp::Ethiopic::Amharic.
GRANULARITY
The granularity parameter refers to the degree of reduction that occurs in the key generation. The granularity modes were created for investigative purposes. The most effective "low" level mode is the default.
"high"
The least coarse grain. "ወ" and "የ" are treated under consonant rules. rules, that is stripped out of the string except as the first char. The default IM correction (shift-slip condition) folds keys both upward and downward only. The high granularity level generates the greatest number of keys. Each substitution causes a new key to be generated so that the set of keys returned represent all possible permutations. The "high" level is the least aggressive in terms of text simplification and leads to the fewest matches. The "high" level is more useful for another types of analysis, such as distance comparison to the canonical word. Since both the canonical and error words have keys folded downward for all granularity levels during IM corrections, there is no particular advantage to the "high" level for the purpose of matching.
"medium"
An in between grain. "ወ" and "የ" are treated under consonant rules. The default IM correction folds keys downward only. The keys generated represent a "lowest common denominator" that would be reducible from the "high" mode keys. More matches will be found at the lowest granularity level, but the risk of false matches becomes higher.
"low"
The default and most coarse, or aggressive, grain. "ወ" and "የ" are treated under vowel rules, that is stripped out of the string except as the first char. Like the medium level, the default IM correction folds keys downward only and the keys again are lowest common denominators of "high" mode keys. More matches will be found at the lowest granularity level, but the risk of false matches becomes higher.
STYLES
By default keys are returned with Ethiopic characters (UTF-8 encoding). If this is not your text "style" of choice, IPA symbols and SERA transliteration are also available. The text style can be set and reset at any time:
At Import Time:
use Text::Metaphone::Amharic qw( style => "ipa" );
At Instantiation Time:
my $mphone = new Text::Metaphone::Amharic ( style => "sera" );
After Instantiation:
$mphone->style ( "ethio" );
A reverse
method is also provided to convert an IPA or SERA symbol key into an equivalent Ethiopic sequence.
REQUIRES
COPYRIGHT
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
BUGS
None presently known.
AUTHOR
Daniel Yacob, dyacob@cpan.org
SEE ALSO
- http://daniel.yacob.name/papers/DanielYacob-ICESXV.pdf
- Text::TransMetaphone
- Included with this package:
-
examples/amphone.pl examples/ipa-phone.pl examples/amphone-high.pl examples/ipa-phone-high.pl examples/granularity.pl examples/matchtest.pl
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 349:
Non-ASCII character seen before =encoding in '"ሥላሴ"'. Assuming CP1252
- Around line 479:
'=item' outside of any '=over'
=over without closing =back