NAME
RTF::Lexer - Rich Text Format (RTF) format lexical analyzer.
SYNOPSIS
use RTF::Lexer qw(:all);
my $parser = RTF::Lexer->new(in => 'text.rtf');
my $token;
do {
$token = $parser->get_token();
} until $parser->is_stop_token($token);
DESCRIPTION
RTF::Lexer is a low-level RTF format lexical analyzer. It splits the input stream into separate tokens, which can be handled by other high-level modules.
METHODS
- new
-
The constructor. Accepts the only argument
in
which must be an input file handle or a file name. In the latter case if there is a failure while opening the file methodnew
throws an exception. By default the input is read fromSTDIN
. - get_token
-
Returns the next token from the input stream. The token is a reference to an array those first element is a numeric id of the token type. The second element is a string representation of the token. The third element may exists only if the token is a control word and represents the numerical parameter of this control word.
The following token types are recognized by RTF::Lexer, that are declared as constants in this module:
- CWORD
-
Control word (eg.
\rtf1
,\trowd
). - CSYMB
-
Control symbol, mentioned in RTF Specification version 1.7.
- CUNDF
-
Unknown control symbol (i.e. not mentioned in RTF Specification).
- PTEXT
-
Plain text.
- ENTER
-
Start of group (
{
). - LEAVE
-
End of group (
}
). - DESTN
-
End of destination group (
}
that turns off destination mode). - ENHEX
-
Data in hexadecimal format that follows
\'
control symbol. - ENBIN
-
End of binary data block (started by
\bin
control word). - WRHEX
-
Symbol which is not a hexadecimal digit found where ENHEX token expected.
- OKEOF
-
Normal end of input stream.
- UNEOF
-
Unexpected end of input stream.
- UNBRC
-
End of group that does not match any start of group.
These constants are not exported by default. Any of them may be exported by request. All of them may be exported by the use of
:all
export tag. - unget_token($token)
-
Pushes back token
$token
so the next call toget_token
will return it. - set_destination
-
Turns on the destination mode, i.e. all tokens will be ignored until the end of current group.
SEE ALSO
RTF::Tokenizer, Rich Text Format (RTF) Specification.
BUGS
It is impossible to have more then one RTF::Lexer objects in a single process.
AUTHOR
Vadim O. Ustiansky <ustiansky@cpan.org>
EOT