NAME
HTML::LoutParser.pm - Module to parse HTML and output Lout
SYNOPSIS
require HTML::LoutParser ;
my $parser = HTML::LoutParser->new() ;
select( OUTPUT_FILEHANDLE ) ;
$parser->start_lout ;
while( <> ) {
$parser->parse( $_ ) ;
}
$parser->eof ;
$parser->end_lout ;
DESCRIPTION
Parses the input, outputting the results to the current output filehandle, STDOUT unless something else is select()
ed.
Options
The parser object can be created with several options, e.g.
$parser = HTML::LoutParser->new(
-filename => 'test.html',
-table => 1,
-last_table_col => 'D',
-comment_tag => 0,
-comment_attr => 0,
-def => 0,
-ignore_comment => 0,
-no_comment => 0,
-verbose => 1,
) ;
-filename
- If given this string will be output as part of the comment at the top of the lout file.
-table
- If set to 1 (the default) will attempt to convert tables. The outcome will almost certainly require hand correction. If set to 0 all the table data will be output, with the tags output as comments (unless -no_comment
is 1).
-last_table_col
- The letter of the last column to use for tables; all tables are converted to a fixed number of columns. The default is F (6) columns.
-comment_tag
- If set to 1 (the default) every tag encountered whether handled or not will be output as a comment. If set to 0 then only unhandled tags will be output.
-comment_attr
- If set to 1 (the default) then every tag output as a comment will have its attributes listed in the comment too.
-def
- If set to 1 (the default) then every <H1> tag will be output as @HeaderA, and so on for <H2>..<H6> with the definitions like this in the lout:
def @HeaderA right x { @CD { +8p @Font { x } } }
Using this is recommended since it will be easier for you to simply change the lout definitions to suit your situation.
-ignore_comment
- If set to 1 all HTML comments will be ignored; if set to 0 they will be output as Lout comments.
-no_comment
- If set to 1 then no tags are output as comments at all whether handled or unhandled.
Thus the default is to have every tag converted to a lout comment with its attributes listed. To have only unhandled tags converted use -comment_tag => 0
. To have no tags output as comments use -no_comment => 1
.
-verbose
- If set to 1 output a count of start tags processed to STDOUT.
EXAMPLES
(See DESCRIPTION.)
BUGS
Not all tags are handled.
Table handling is simplistic. Tables use a fixed number of columns because we don't know how many we will need. Also we ignore colspan and rowspan, again because we don't know what we need. Lout needs this info before each row, HTML gives the info as it goes. Basically you'll need to fix tables (amongst other things) by hand.
Doesn't always match up all braces - not sure if this is a bug or invalid HTML in the test files.
If you have something like "<I>this</I>." it may become "{}@I {this} ." I don't know a solution for this one.
CHANGES
1999/07/18
First properly documented release.
1999/07/21
Added @CNP suggested by David Duffy <davidD@qimr.edu.au>.
1999/07/30
Added -verbose option.
1999/08/08
Changed licence to LGPL.
1999/09/04
Tiny fixes.
2000/01/22
Now <H1>..<H6> are output as @HeaderA..@HeaderF by default (this can be switched off for backward compatibility). This should make it a lot easier to change the header style because you only have to change the defs not do global search and replaces. Also we now output @LLP when we get </P> tags since this seems to improve things a bit.
2000/01/24
Now correctly skip (or output as comments) <A NAME="name"> tags.
AUTHOR
Mark Summerfield. I can be contacted as <summer@perlpress.com> - please include the word 'loutparser' in the subject line.
COPYRIGHT
Copyright (c) Mark Summerfield 1999-2000. All Rights Reserved.
This module may be used/distributed/modified under the LGPL.