NAME

RTF::Lexer - Rich Text Format (RTF) lexical analyzer.

SYNOPSIS

use RTF::Lexer qw(:all);

my $parser = RTF::Lexer->new(in => 'text.rtf');
my $token;
do {
  $token = $parser->get_token();
} until $parser->is_stop_token($token);

DESCRIPTION

RTF::Lexer is a low-level RTF format lexical analyzer. It splits the input stream into separate tokens, which can be handled by other high-level modules.

METHODS

new

The constructor. Accepts the only argument in which must be an input file handle or a file name. In the latter case if there is a failure while opening the file method new throws an exception. By default the input is read from STDIN.

get_token

Returns the next token from the input stream. The token is a reference to an array those first element is a numeric id of the token type. The second element is a string representation of the token. The third element may exists only if the token is a control word and represents the numerical parameter of this control word.

The following token types are recognized by RTF::Lexer, that are declared as constants in this module:

CWORD

Control word (eg. \rtf1, \trowd).

CSYMB

Control symbol, mentioned in RTF Specification version 1.7.

CUNDF

Unknown control symbol (i.e. not mentioned in RTF Specification).

PTEXT

Plain text.

ENTER

Start of group ({).

LEAVE

End of group (}).

DESTN

End of destination group (} that turns off destination mode).

ENHEX

Data in hexadecimal format that follows \' control symbol.

ENBIN

End of binary data block (started by \bin control word).

WRHEX

Symbol which is not a hexadecimal digit found where ENHEX token expected.

OKEOF

Normal end of input stream.

UNEOF

Unexpected end of input stream.

UNBRC

End of group that does not match any start of group.

These constants are not exported by default. Any of them may be exported by request. All of them may be exported by the use of :all export tag.

unget_token($token)

Pushes back token $token so the next call to get_token will return it.

set_destination

Turns on the destination mode, i.e. all tokens will be ignored until the end of current group.

SEE ALSO

RTF::Tokenizer, Rich Text Format (RTF) Specification.

BUGS

It is impossible to have more then one RTF::Lexer objects in a single process.

AUTHOR

Vadim O. Ustiansky <ustiansky@cpan.org>

EOT