NAME

Locale::Country::Multilingual::Unicode - Recommended Usage with Unicode

VERSION

version 0.25

SYNOPSIS

use utf8;
use Encode::StdIO;
use Locale::Country::Multilingual {use_io_layer => 1};

my $lcm = Locale::Country::Multilingual->new;

$lcm->set_lang('de');
print $lcm->code2country('gb'), "\n";

DESCRIPTION

You are on a modern computer system, that uses utf-8 encoding by default. Locale::Country::Multilingual uses language data, that is in utf-8 too. Everything is fine.... Really?

Try this in your favorite terminal:

> perl -le 'print "bäh!"'
bäh!

Uppercase it:

> LANG=en_US perl -Mlocale -le 'print uc "bäh!"'
BäH!

Wrong! It should have been BÄH!. Though on latin1 systems it works. Same for Österreich - the German and native name for Austria. If you run lc() on it, it won't change.

What happened is, that you write files (and code) in utf-8, a multi-byte encoding, but Perl expects latin1 (iso-8859-1) by default, a single-byte encoding. Provided you use locale; together with an appropriate locale (here en_US) in your Perl program, a lowercase latin1 ä (0xe4) is turned into an uppercase Ä (0xc4) - but only if your input comes as latin1.

A utf-8 ä is encoded as 0xc3, 0xa4. Therefore uc() does not detect the two-byte ä as a letter that could be uppercased.

Language files in Locale::Country::Multilingual are in utf-8.

To make everything work the correct workflow is:

use utf8;

This pragma tells Perl, that all text in your code is actually in utf-8, so the Perl interpreter converts it into its internal string format correctly. Actually this is only necessary, when you have literals that contain non-ASCII characters, e.g. when you code:

print "Dürüm Döner Kebap\n";

Even if your system does not use utf-8 by default, your Perl programs should be encoded in utf-8. Use an editor where you can set the encoding.

Set encoding for input and output

By default Perl converts the internal string representation into latin1 for input and output. So the above print output would be broken on a non-latin1 system. For switching STDIN, STDOUT and STDERR to utf-8, you can write:

binmode STDIN, ':utf8';
binmode STDOUT, ':utf8';
binmode STDERR, ':utf8';

If your system uses another encoding, e.g. "euc-jp", you can switch a filehandle to that encoding with:

binmode FH, ':encoding(euc-jp)';

In a web application don't forget to set the output MIME type as well!

If output goes to a terminal:

use Encode::StdIO;

This module determines your terminal's encoding - even if it is something other than utf-8 - and sets the appropriate IO layers for the three standard IO handles.

Set use_io_layer => 1

There are two places where this option can be specified: Either in use or in new:

use Locale::Country::Multilingual {use_io_layer => 1};

my $lcm = Locale::Country::Multilingual->new(
  lang => 'de',
  use_io_layer => 1,
);
print uc $lcm->code2country('gb'), "\n";

That should print

VEREINIGTES KÖNIGREICH GROSSBRITANNIEN UND NORDIRLAND

Wow! Even the "ß" has been converted correctly into "SS".

NAME

Locale::Country::Multilingual::Unicode - Recommended Usage with Unicode

SEE ALSO

perluniintro, Encode::StdIO

AUTHOR

Bernhard Graf graf(a)cpan,org

COPYRIGHT & LICENSE

This text is in the public domain.

AUTHORS

  • Bernhard Graf <graf@cpan.org>

  • Fayland Lam <fayland@gmail.com>

  • Greg Oschwald <oschwald@cpan.org>

COPYRIGHT AND LICENSE

This software is copyright (c) 2014 by Fayland Lam.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.