NAME
Unicode::Confuse - Identify and replace Unicode confusables
SYNOPSIS
use Unicode::Confuse;
VERSION
This documents version 0.03 of Unicode-Confuse corresponding to git commit a5b1eee3cc6e892e7e08bb2d0530b6bfd87017eb released on Thu Apr 15 09:11:48 2021 +0900.
DESCRIPTION
Unicode confusables Perl module.
FUNCTIONS
canonical
my $canonical = canonical ($c);
If $c
is a confusable, give the canonical form of $c
. If not, returns the undefined value.
confusable
if (confusable ($c)) {
# do something.
}
Is $c
confusable, yes or no? This matches $c
against the large regex in Unicode::Confuse::Regex.
DEPENDENCIES
- File::Slurper
-
This is used by the parsing module Unicode::Confuse::Parse. You don't actually need to use the parsing module to use this module with the built-in version of the data.
- JSON::Parse
-
This is used to parse the JSON-formatted file of confusables distributed with the module.
BUGS
- Unicode specifications
-
This does not even attempt to replicate the Unicode requirements for software for handling confusables. In other words, this Perl module makes no claim whatsoever to be "An implementation claiming conformance to this specification" as described in the text of the "Unicode Consortium specification".
- Data quality
-
The data in the Unicode confusables file is of mixed quality, with nearly identical or indistinguishable characters muddled together with things which are clearly quite different from one another.
SEE ALSO
In this distribution
The script make-confusables.pl, available only in the github repository, makes the data files for this Perl distribution.
Unicode::Confuse::Parse is used to parse the data file (confusables.txt). It is used by make-confusables.pl.
Unicode::Confuse::Regex is generated by make-confusables.pl. It matches all confusables.
The unexported variable $Unicode::Confuse::data
contains the complete confusable data. $data->{confusables}
contains a map from confusables to the canonical format, and $data->{reverse}
contains a map from the canonical form to an array containing the corresponding set of confusables, which may have only one member.
Unicode Consortium information
- Unicode Consortium specification
-
See http://www.unicode.org/reports/tr39 for the Unicode Consortium specification.
- Unicode data files
-
The following links point to the latest data files:
AUTHOR
Ben Bullock, <bkb@cpan.org>
COPYRIGHT & LICENCE
This package and associated files are copyright (C) 2021 Ben Bullock.
You can use, copy, modify and redistribute this package and associated files under the Perl Artistic Licence or the GNU General Public Licence.