NAME

RTF::HTMLConverter - Converter from RTF format to HTML.

SYNOPSIS

use XML::GDOME;
use RTF::HTMLConverter;
my $parser = RTF::HTMLConverter->new(in  => 'test.rtf',
                                     out => 'test.html');
$parser->parse();

use XML::DOM;
use RTF::HTMLConverter;
open my $in, 'test.rtf' or die;
my $parser = RTF::HTMLConverter->new(
  in  => $in,
  out => 'test.html',
  DOMImplementation => 'XML::DOM',
  image_uri => "http://somewhere.net/images",
  codepage => 'iso-8859-1',
);
$parser->parse();

use XML::GDOME;
use RTF::HTMLConverter;
my $html = '';
my $parser = RTF::HTMLConverter->new(
  in => 'test.rtf',
  out => \$html,
  discard_images => 1,
);
$parser->parse();

DESCRIPTION

RTF::HTMLConverter is a high-level RTF to HTML format converter. It is based on the low-level RTF parser module RTF::Lexer. Additionally, it requires the W3C's DOM implementation and it is known to work with either XML::DOM or XML::GDOME.

METHODS

new

The constructor. The following parameters are recognized:

in

Input file handle or a file name. Default value is \*STDIN. See RTF::Lexer for more information.

out

Output file handler or file name or scalar reference. If this parameter is a string it is treated as a file name and the constructor tries to open that file. If that file already exists, it is truncated. In the case of failure while opening the file an exception is thrown. If this parameter is a scalar reference the resulting html is stored in that scalar.

DOMImplementation

The DOM implementation module name. Supported values are XML::DOM and XML::GDOME. The default value is XML::GDOME.

codepage

The charset of the resulted html-document. By default is utf8. This parameter is recognized only if DOMImplementation is XML::GDOME.

formatting

The formatting of the resulted html-document. This parameter is recognized only if DOMImplementation is XML::GDOME. Possible values are: GDOME_SAVE_STANDARD and GDOME_SAVE_LIBXML_INDENT. See XML::GDOME::Document for more information. Default value is GDOME_SAVE_LIBXML_INDENT.

doctype

A reference to an array ($name, $publicId, $systemId) if DOMImplementation is XML::GDOME or ($name, $systemId, $publicId) if DOMImplementation is XML::DOM. Default values are:

$name

HTML

$publicId

-//W3C//DTD HTML 4.01 Transitional//EN

$systemId

http://www.w3.org/TR/html4/loose.dtd

discard_images

Being set, this parameter disables any image processing. By default it is unset.

image_uri

The string that being concatenated with the image name gives this image's URL. Default value is empty string.

image_dir

A directory name where the images are generated. Default value is empty string which means the current directory.

image_names

The pattern for generating image names from there number. Default value is img%d.

image_convert

A path to ImageMagick's convert utility. Default value is simply convert assuming it is in one of the $ENV{PATH} directories.

image_mogrify

A path to ImageMagick's mogrify utility. If the value is undef or the specified file does not exists, the images extracted from RTF will not be scaled. Default value is mogrify.

image_wmf2eps

A path to libwmf's wmf2eps utility. If the value is undef or the specified file does not exists, the WMF-images will not be extracted from RTF. Default value is wmf2eps.

screen_resolution

The display resolution in dpi. Default value is 100.

parse

Parses the input RTF stream until the end of file.

SEE ALSO

RTF::Lexer, Rich Text Format (RTF) Specification (version 1.7), The_RTF_Cookbook, RTF::Parser, RTF::Tokenizer.

KNOWN BUGS

-

The symbols that absent in Unicode character set will be displayed incorrectly.

-

The images that are stored in RTF file in WMF format may be scaled incorrectly.

-

The text in WMF images in non-ASCII charset may be displayed incorrectly.

And there should be lots of unknown bugs;)

AUTHOR

Vadim O. Ustiansky <ustiansky@cpan.org>