NAME
Postscript::TextDecode - decode special characters in postscript strings
VERSION
version 0.4
SYNOPSIS
use Postscript::TextDecode
my $ps = Postscript::TextDecode->new;
$ps->encoding( $encoding );
my $text $ps->ps_to_text( $postscript_string );
DESCRIPTION
Postscript::TextDecode makes it easy to decode special characters in strings extracted from postscript. It currently supports /232, uDEAD and uniDEADBEEFFEED format.
METHODS
encoding
This gets/sets the encoding that will be used to find the glyphs denoted by /xxx octal notation. The format should be one like
/.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /.notdef /space /exclam /quotedbl /numbersign /dollar /percent /ampersand /quotesingle /parenleft /parenright /asterisk /plus /comma /hyphen /period /slash /zero /one /two /three /four /five /six /seven /eight /nine /colon /semicolon /less /equal /greater /question /at /A /B /C /D /E /F /G /H /I /J /K /L /M /N /O /P /Q /R /S /T /U /V /W /X /Y /Z /bracketleft /backslash /bracketright /asciicircum /underscore /grave /a /b /c /d /e /f /g /h /i /j /k /l /m /n /o /p /q /r /s /t /u /v /w /x /y /z /braceleft /bar /braceright /asciitilde /.notdef /Adieresis /Aring /Ccedilla /Eacute /Ntilde /Odieresis /Udieresis /aacute /agrave /acircumflex /adieresis /atilde /aring /ccedilla /eacute /egrave /ecircumflex /edieresis /iacute /igrave /icircumflex /idieresis /ntilde /oacute /ograve /ocircumflex /odieresis /otilde /uacute /ugrave /ucircumflex /udieresis /dagger /degree /cent /sterling /section /bullet /paragraph /germandbls /registered /copyright /trademark /acute /dieresis /Euro /AE /Oslash /brokenbar /plusminus /twosuperior /threesuperior /yen /mu /.notdef /.notdef /onesuperior /onequarter /threequarters /ordfeminine /ordmasculine /onehalf /ae /oslash /questiondown /exclamdown /logicalnot /.notdef /florin /.notdef /.notdef /guillemotleft /guillemotright /ellipsis /lslash /Agrave /Atilde /Otilde /OE /oe /endash /emdash /quotedblleft /quotedblright /quoteleft /quoteright /divide /multiply /ydieresis /Ydieresis /fraction /currency /guilsinglleft /guilsinglright /fi /fl /daggerdbl /periodcentered /quotesinglbase /quotedblbase /perthousand /Acircumflex /Ecircumflex /Aacute /Edieresis /Egrave /Iacute /Icircumflex /Idieresis /Igrave /Oacute /Ocircumflex /Lslash /Ograve /Uacute /Ucircumflex /Ugrave /dotlessi /circumflex /tilde /macron /breve /dotaccent /ring /cedilla /hungarumlaut /ogonek /caron
It's internally associated with a md5hash to prevent re-parsing.
font_name
This gets/sets the font_name, can be set to 'ZapfDingbats', otherwise the default font glyph_to_unicode mapping is assumed.
u_to_char
Returns uDEAF[..] notation to unicode character
uni_to_chars
Returns uniDEADBEEFFEED[....] notation to unicode character
glyph_to_char
Returns odieresis (glyph name) notation to unicode character
glyph_to_dec
Returns odieresis (glyph name) encoding table index number
oct_to_glyph
Returns glyph name (e.g. odieresis) for /232 octal notation
oct_to_char
Returns unicode character associated with octal encoding glyph index
TODO
Currently u_to_char and uni_to_char haven't been tested, I've not encountered this encoding in the wild.
SEE ALSO
Postscript::TextExtract(TBA), perl(1)
AUTHOR
Job van Achterberg <jkva@cpan.org>
LICENSE
Copyright (C) 2010 Job van Achterberg <jkva@cpan.org>
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
See http://www.perl.com/perl/misc/Artistic.html