NAME
Locale::Country::Multilingual::Unicode - Recommended Usage with Unicode
VERSION
version 0.25
SYNOPSIS
use utf8;
use Encode::StdIO;
use Locale::Country::Multilingual {use_io_layer => 1};
my $lcm = Locale::Country::Multilingual->new;
$lcm->set_lang('de');
print $lcm->code2country('gb'), "\n";
DESCRIPTION
You are on a modern computer system, that uses utf-8
encoding by default. Locale::Country::Multilingual uses language data, that is in utf-8
too. Everything is fine.... Really?
Try this in your favorite terminal:
> perl -le 'print "bäh!"'
bäh!
Uppercase it:
> LANG=en_US perl -Mlocale -le 'print uc "bäh!"'
BäH!
Wrong! It should have been BÄH!
. Though on latin1
systems it works. Same for Österreich
- the German and native name for Austria
. If you run lc()
on it, it won't change.
What happened is, that you write files (and code) in utf-8
, a multi-byte encoding, but Perl expects latin1
(iso-8859-1
) by default, a single-byte encoding. Provided you use locale;
together with an appropriate locale (here en_US
) in your Perl program, a lowercase latin1
ä
(0xe4
) is turned into an uppercase Ä
(0xc4
) - but only if your input comes as latin1
.
A utf-8
ä
is encoded as 0xc3, 0xa4
. Therefore uc()
does not detect the two-byte ä
as a letter that could be uppercased.
Language files in Locale::Country::Multilingual
are in utf-8
.
To make everything work the correct workflow is:
- use utf8;
-
This pragma tells Perl, that all text in your code is actually in
utf-8
, so the Perl interpreter converts it into its internal string format correctly. Actually this is only necessary, when you have literals that contain non-ASCII characters, e.g. when you code:print "Dürüm Döner Kebap\n";
Even if your system does not use
utf-8
by default, your Perl programs should be encoded inutf-8
. Use an editor where you can set the encoding. - Set encoding for input and output
-
By default Perl converts the internal string representation into
latin1
for input and output. So the aboveprint
output would be broken on a non-latin1
system. For switchingSTDIN
,STDOUT
andSTDERR
toutf-8
, you can write:binmode STDIN, ':utf8'; binmode STDOUT, ':utf8'; binmode STDERR, ':utf8';
If your system uses another encoding, e.g.
"euc-jp"
, you can switch a filehandle to that encoding with:binmode FH, ':encoding(euc-jp)';
In a web application don't forget to set the output MIME type as well!
If output goes to a terminal:
use Encode::StdIO;
This module determines your terminal's encoding - even if it is something other than
utf-8
- and sets the appropriate IO layers for the three standard IO handles. - Set
use_io_layer => 1
-
There are two places where this option can be specified: Either in
use
or in new:use Locale::Country::Multilingual {use_io_layer => 1}; my $lcm = Locale::Country::Multilingual->new( lang => 'de', use_io_layer => 1, ); print uc $lcm->code2country('gb'), "\n";
That should print
VEREINIGTES KÖNIGREICH GROSSBRITANNIEN UND NORDIRLAND
Wow! Even the
"ß"
has been converted correctly into"SS"
.
NAME
Locale::Country::Multilingual::Unicode - Recommended Usage with Unicode
SEE ALSO
AUTHOR
Bernhard Graf graf(a)cpan,org
COPYRIGHT & LICENSE
This text is in the public domain.
AUTHORS
Bernhard Graf <graf@cpan.org>
Fayland Lam <fayland@gmail.com>
Greg Oschwald <oschwald@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2014 by Fayland Lam.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.