NAME
cyrillic - Library for fast and easy cyrillic text manipulation
SYNOPSIS
use cyrillic qw/866 win2dos convert locase upcase detect/;
print convert( 866, 1251, $str );
print convert( 'dos','win', \$str );
print win2dos $str;
DESCRIPTION
This module includes cyrillic string converting functions from one and to another charset, to upper and to lower case without locale switching. Also included single-byte charsets detection routine. It is easy to add new code pages. For this purpose it is necessary only to add appropriate string of a code page.
Supported charsets: ibm866, koi8-r, cp855, windows-1251, MacWindows, iso_8859-5, unicode, utf8;
If the first imported parameter - number of a code page, then locale will be switched to it.
FUNCTIONS
convert - between charsets convertor
upcase - convert to upper case
locase - convert to lower case
upfirst - convert first char to upper case
lofirst - convert first char to lower case
detect - detect codepage number
charset - returns charset name for codepage number
At importing list also might be listed named convertors. For Ex.:
use cyrillic qw/dos2win win2koi mac2dos ibm2dos/;
NOTE! Specialisations (like win2dos, utf2win) call faster then convert.
NOTE! Only convert function and they specialisation work with Unicode and UTF-8 strings. All others function work only with single-byte sharsets.
Names for using in named charset convertors:
dos ibm866 866
koi koi8-r 20866
ibm cp855 855
win windows-1251 1251
mac ms-cyrillic 10007
iso iso_8859-5 28585
uni Unicode
utf UTF-8
The following rules are correct for converting functions:
VAR may be SCALAR or REF to SCALAR.
If VAR is REF to SCALAR then SCALAR will be converted.
If VAR is ommited then $_ operated.
If function called to void context and VAR is not REF
then result placed to $_.
CONVERSION METHODS
- convert SRC_CP, DST_CP, [VAR]
-
Convert VAR from SRC_CP codepage to DST_CP codepage and returns converted string.
The converting Unicode or UTF-8 data requires presence of installed Unicode::String and Unicode::Map.
- upcase CODEPAGE, [VAR]
-
Convert VAR to uppercase using CODEPAGE table and returns converted string.
- locase CODEPAGE, [VAR]
-
Convert VAR to lowercase using CODEPAGE table and returns converted string.
- upfirst CODEPAGE, [VAR]
-
Convert first char of VAR to uppercase using CODEPAGE table and returns converted string.
- lofirst CODEPAGE, [VAR]
-
Convert first char of VAR to lowercase using CODEPAGE table and returns converted string.
MAINTAINANCE METHODS
- charset CODEPAGE
-
Returns charset name for CODEPAGE.
- detect ARRAY
-
Detect single-byte codepage of data in ARRAY and returns codepage number. If codepage not detected then returns undefined value;
EXAMPLES
use cyrillic qw/convert locase upcase detect dos2win win2dos/;
$_ = "\x8F\xE0\xA8\xA2\xA5\xE2 \xF0\xA6\x88\xAA\x88!";
printf " dos: '%s'\n", $_;
upcase 866;
printf " upcase: '%s'\n", $_;
dos2win;
printf "dos2win: '%s'\n", $_;
win2dos;
printf "win2dos: '%s'\n", $_;
locase 866;
printf " locase: '%s'\n", $_;
printf " detect: '%s'\n", detect $_;
# CONVERTING TEST:
use cyrillic qw/utf2dos mac2utf dos2mac win2dos utf2win/;
$_ = "Хелло Ворльд!\n";
print "UTF-8: $_";
print " DOS: ", utf2dos mac2utf dos2mac win2dos utf2win $_;
# EQVIVALENT CALLS:
dos2win( $str ); # called to void context -> result placed to $_
$_ = dos2win( $str );
dos2win( \$str ); # called with REF to string -> direct converting
$str = dos2win( $str );
dos2win(); # with ommited param called -> $_ converted
dos2win( \$_ );
$_ = dos2win( $_ );
# FOR EASY SWITCH LOCALE CODEPAGE
use cyrillic qw/866/; # locale switched to Russian_Russia.866
use locale;
print $str =~ /(\w+)/;
no locale;
print $str =~ /(\w+)/;
FAQ
* Q: Why module say: Can't create Unicode::Map for 'koi8-r' charset!
A: Your Unicode::Map module can't find map file for 'koi8-r' charset.
In Unicode::Map manual is told whence it is possible to download
this file and as it to install in the system.
* Q: Why perl say: "Undefined subroutine koi2win called" ?
A: The function B<koi2win> is specialization of the function B<convert>,
which is created at inclusion it of the name in the list of import.
AUTHOR
Albert MICHEEV <Albert@f80.n5049.z2.fidonet.org>
COPYRIGHT
Copyright (C) 2000, Albert MICHEEV
This module is free software; you can redistribute it or modify it under the same terms as Perl itself.
AVAILABILITY
The latest version of this library is likely to be available from:
http://www.perl.com/CPAN
SEE ALSO
Unicode::String, Unicode::Map.
5 POD Errors
The following errors were encountered while parsing the POD:
- Around line 272:
'=item' outside of any '=over'
- Around line 300:
You forgot a '=back' before '=head1'
- Around line 302:
'=item' outside of any '=over'
- Around line 311:
You forgot a '=back' before '=head1'
- Around line 333:
Non-ASCII character seen before =encoding in '"Хелло'. Assuming CP1252