NAME
Encode::Escape::Unicode - Perl extension for Encoding of Unicode Escape Sequnces
SYNOPSIS
use Encode::Escape::Unicode;
$escaped = "What is \\x{D384}? It's Perl!";
$string = decode 'unicode-escape', $escaped;
# Now, $string is equivalent "What is \x{D384}? It's Perl!"
Encode::Escape::Unicode->demode('python');
$python_unicode_escape = "And \\u041f\\u0435\\u0440\\u043b? It's Perl, too.";
$string = decode 'unicode-escape', $python_unicode_escape;
# Now, $string eq "And \x{041F}\x{0435}\x{0440}\x{043B}? It's Perl, too."
If you have a text data file 'unicode-escape.txt'. It contains a line:
What is \x{D384}? It's Perl!\n
And \x{041F}\x{0435}\x{0440}\x{043B}? It's Perl, too.\n
And you want to use it as if it were a normal double quote string in source code. Try this:
use Encode::Escape::Unicode;
open(FILE, 'unicode-escape.txt');
while(<FILE>) {
chomp;
print encode 'utf8', decode 'unicode-escape', $_;
}
DESCRIPTION
Encode::Escape::Unicode module implements encodings of escape sequences.
Simply saying, it converts (decodes) escape sequences into Perl internal string (\x{0000} -- \x{ffff}) and encodes Perl strings to escape sequences.
MODES AND SUPPORTED ESCAPE SEQUENCES
default or perl mode
Escape Sequcnes Description
--------------- --------------------------
\a Alarm (beep)
\b Backspace
\e Escape
\f Formfeed
\n Newline
\r Carriage return
\t Tab
\000 - \377 octal ASCII value. \0, \00, and \000 are equivalent.
\x00 - \xff hexadecimal ASCII value. \x0 and \x00 are equivalent.
\x{0000} - \x{ffff} hexadecimal ASCII value. \x{0}, \x{00}, x\{000}, \x{0000}
\\ Backslash
\$ Dollar Sign
\@ Ampersand
\" Print double quotes
\ Escape next character if known otherwise print
This is the default mode. You don't need to invoke it since you haven't invoke other mode previously.
python or java mode
Python, Java, and C# languages use \u
xxxx escape sequence for Unicode character.
Escape Sequcnes Description
--------------- --------------------------
\a Alarm (beep)
\b Backspace
\e Escape
\f Formfeed
\n Newline
\r Carriage return
\t Tab
\000 - \377 octal ASCII value. \0, \00, and \000 are equivalent.
\x00 - \xff hexadecimal ASCII value. \x0 and \x00 are equivalent.
\u0000 - \uffff hexadecimal ASCII value.
\\ Backslash
\$ Dollar Sign
\@ Ampersand
\" Print double quotes
\ Escape next character if known otherwise print
If you have data which contains \u
xxxx escape sequences, this will translate them to utf8-encoded characters:
use Encode::Escape;
Encode::Escape::demode 'unicode-escape', 'python';
while(<>) {
chomp;
print encode 'utf8', decode 'unicode-escape', $_;
}
And this will translate \u
xxxx to \x{
xxxx}
.
use Encode::Escape;
Encode::Escape::enmode 'unicode-escape', 'perl';
Encode::Escape::demode 'unicode-escape', 'python';
while(<>) {
chomp;
print encode 'unicode-escape', decode 'unicode-escape', $_;
}
SEEALSO
See Encode::Escape.
AUTHOR
you, <you at cpan dot org>
COPYRIGHT AND LICENSE
Copyright (C) 2007 by you
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.