NAME

Encode::Escape - Perl extension for Encodings of various escape sequences

SYNOPSIS

use Encode::Escape;

$escaped_ascii = "Perl\\tPathologically Eclectic Rubbish Lister\\n";
$ascii = decode 'ascii-escape', $escaped_ascii;

# Now, $ascii is equivalent to 
# double quote string "Perl\tPathologically Eclectic Rubbish Lister\n"

$escaped_unicode = "Perl \\x{041F}\\x{0435}\\x{0440}\\x{043B} \\x{D384}" 
$string = decode 'unicode-escape', $escaped_unicode;

# Now, $string is equvialent to 
# double quote string "Perl \x{041F}\x{0435}\x{0440}\x{043B} \x{D384}" 

It may looks non-sense. Here's another case.

If you have a text data file 'ascii-escape.txt'. It contains a line:

Perl\tPathologically Eclectic Rubbish Lister\n

And you want to use it as if it were a normal double quote string in source code. Try this:

open(FILE, 'ascii-escape.txt');
while(<FILE>) {
  chomp;
  print decode 'ascii-escape', $_;
}

DESCRIPTION

Encode::Escape module is a wrapper class for encodings of escape sequences.

It is NOT for an escape-based encoding (eg. ISO-2022-JP). It is for encoding/decoding escape sequences, generally used in source codes.

Many programming languages, markup languages, and typesetting languages provide methods for encoding special (or functional) characters and non-keyboard symbols in the form of escape sequence. That's what I concern.

For example, Encode::Escape::Unicode converts (encodes) Unicode characters into escape sequences representing it: \x{AC00} to the first Hangul syllable Ka, \x{042F} to Cyrillic capital letter Ya, etc.

Yes, you're right. There already exist many modules. See String::Escape, Unicode::Escape, TeX::Encode, HTML::Mason::Escape, Template::Plugin::XML::Escape, URI::Escape, etc.

But for some reason I need to do it in a different way. There is more than one way to do it! After that, I asked myself if this module is useful. May be not except for me. At this time, Zhuangzi reminds me that "the useless has its use".

Escape Sequences

Character Escape Codes

ASCII defines 128 characters: 33 non-printing control characters (0x00 -- 0x1f, 0x7f) and 95 printable characters (0x20 -- 0x7e).

Character Escape Codes in C programming language provide a method to express control characters, using only printable ones. These are accepted by Perl and many other languages.

CEC    HEX     Description
---    ----    --------------
\0     00      Null character
\a     07      Bell
\b     08      Backspace
\t     09      Horizontal Tab
\n     0a      Line feed
\v     0b      Vertical Tab
\f     0c      Form feed
\r     0d      Carriage return

Programming languages provide escape sequences for printable characters, which have significant meaning in that language. Otherwise, it would be harder to print them literally.

ESC    HEX     Description
---    ----    ---------------
\"     22      double quote 
\\     52      backslash

Refer to ASCII, Escape character, Escape sequence at <http://en.wikipedia.org>, for more details.

Perl Escape Sequences

Perl use backslash as an escape character.

These work in normal strings and regular expressions except \b.

ESC       Description
---       --------------------------
\a        Alarm (beep)
\b        Backspace
\e        Escape
\f        Formfeed
\n        Newline
\r        Carriage return
\t        Tab
\037      Any octal ASCII value
\x7f      Any hexadecimal ASCII value
\x{263a}  A wide hexadecimal value
\cx       Control-x
\N{name}  name is a name for the Unicode character (use charnames)

The following escape sequences are available in constructs that interpolate.

\l        Lowercase next character
\u        Titlecase next character
\L        Lowercase until \E
\U        Uppercase until \E
\E        End case modification

In regular expresssions:

\b        An assetion, not backspace, except in a character class
\Q        Disable pattern metacharacters until \E

Unlike C and other languages, Perl has no \v escape sequence for the vertical tab (VT - ASCII 11).

For constructs that do interpolate, variable begining with "$" or "@" are interpolated.

\$        Dollar Sign
\@        Ampersand
\"        Print double quotes
\         Escape next character if know otherwise print

See perlerref, perlop

Python Escape Sequences

\newline   Ignored
\\         Backslash (\)
\'         Single quote (')
\"         Double quote (")
\a         ASCII Bell (BEL)
\b         ASCII Backspace (BA)
\f         ASCII Formfeed (FF)
\n         ASCII Linefeed (LF)
\N{name}   Character named 'name' in the Unicode database
\r         ASCII Carriage Return (CR)
\t         ASCII Horizontal Tab (TAB)
\uxxxx     Character with 16-bit hex value xxxx (Unicode only)
\Uxxxxxxxx Character with 32-bit hex value xxxxxxxx (Unicode only)
\v         ASCII Vertical Tab (VT)
\ooo       Character with octal value ooo
\xhh       Character with hex value hh

See http://docs.python.org/ref/strings.html

Unicode escape sequences in the form of \uxxxx are used in Java, Python, C#, JavaScript. Unicode::Escape module implements it.

LaTeX Escapes

TeX::Encode implements encodings of LaTeX escapes. It converts (encodes) utf8 string to LaTeX escapes.

SEE ALSO

The useless has its use

Hui Tzu said to Chuang Tzu, "Your words are useless!"

Chuang Tzu said, "A man has to understand the useless before you can talk to him about the useful. The earth is certainly vast and broad, though a man uses no more of it than the area he puts his feet on. If, however, you were to dig away all the earth from around his feet until you reached the Yellow Springs, then would the man still be able to make use of it?"

"No, it would be useless," said Hui Tzu.

"It is obvious, then," said Chuang Tzu, "that the useless has its use."

--- from External Things, Chuang Tzu translated by Burton Watson

AUTHOR

You Hyun Jo, <you at cpan dot org>

COPYRIGHT AND LICENSE

Copyright (C) 2007 by You Hyun Jo

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.