NAME

Postscript::TextDecode - decode special characters in postscript strings

VERSION

version 0.4

SYNOPSIS

use Postscript::TextDecode

my $ps = Postscript::TextDecode->new;
$ps->encoding( $encoding );

my $text $ps->ps_to_text( $postscript_string );

DESCRIPTION

Postscript::TextDecode makes it easy to decode special characters in strings extracted from postscript. It currently supports /232, uDEAD and uniDEADBEEFFEED format.

METHODS

encoding

This gets/sets the encoding that will be used to find the glyphs denoted by /xxx octal notation. The format should be one like

/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /space /exclam /quotedbl /numbersign /dollar /percent /ampersand /quotesingle /parenleft /parenright /asterisk /plus /comma /hyphen /period /slash /zero /one /two /three /four /five /six /seven /eight /nine /colon /semicolon /less /equal /greater /question /at /A /B /C /D /E /F /G /H /I /J /K /L /M /N /O /P /Q /R /S /T /U /V /W /X /Y /Z /bracketleft /backslash /bracketright /asciicircum /underscore /grave /a /b /c /d /e /f /g /h /i /j /k /l /m /n /o /p /q /r /s /t /u /v /w /x /y /z /braceleft /bar /braceright /asciitilde /.notdef /Adieresis /Aring /Ccedilla /Eacute /Ntilde /Odieresis /Udieresis /aacute /agrave /acircumflex /adieresis /atilde /aring /ccedilla /eacute /egrave /ecircumflex /edieresis /iacute /igrave /icircumflex /idieresis /ntilde /oacute /ograve /ocircumflex /odieresis /otilde /uacute /ugrave /ucircumflex /udieresis /dagger /degree /cent /sterling /section /bullet /paragraph /germandbls /registered /copyright /trademark /acute /dieresis /Euro /AE /Oslash /brokenbar /plusminus /twosuperior /threesuperior /yen /mu /.notdef /.notdef /onesuperior /onequarter /threequarters /ordfeminine /ordmasculine /onehalf /ae /oslash /questiondown /exclamdown /logicalnot /.notdef /florin /.notdef /.notdef /guillemotleft /guillemotright /ellipsis /lslash /Agrave /Atilde /Otilde /OE /oe /endash /emdash /quotedblleft /quotedblright /quoteleft /quoteright /divide /multiply /ydieresis /Ydieresis /fraction /currency /guilsinglleft /guilsinglright /fi /fl /daggerdbl /periodcentered /quotesinglbase /quotedblbase /perthousand /Acircumflex /Ecircumflex /Aacute /Edieresis /Egrave /Iacute /Icircumflex /Idieresis /Igrave /Oacute /Ocircumflex /Lslash /Ograve /Uacute /Ucircumflex /Ugrave /dotlessi /circumflex /tilde /macron /breve /dotaccent /ring /cedilla /hungarumlaut /ogonek /caron

It's internally associated with a md5hash to prevent re-parsing.

font_name

This gets/sets the font_name, can be set to 'ZapfDingbats', otherwise the default font glyph_to_unicode mapping is assumed.

u_to_char

Returns uDEAF[..] notation to unicode character

uni_to_chars

Returns uniDEADBEEFFEED[....] notation to unicode character

glyph_to_char

Returns odieresis (glyph name) notation to unicode character

glyph_to_dec

Returns odieresis (glyph name) encoding table index number

oct_to_glyph

Returns glyph name (e.g. odieresis) for /232 octal notation

oct_to_char

Returns unicode character associated with octal encoding glyph index

TODO

Currently u_to_char and uni_to_char haven't been tested, I've not encountered this encoding in the wild.

SEE ALSO

Postscript::TextExtract(TBA), perl(1)

AUTHOR

Job van Achterberg <jkva@cpan.org>

LICENSE

Copyright (C) 2010 Job van Achterberg <jkva@cpan.org>

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

See http://www.perl.com/perl/misc/Artistic.html