NAME
HTML::TableExtractor - Do stuff with the layout of HTML tables.
SYNOPSIS
use HTML::TableExtractor;
$p = HTML::TableExtractor->new();
$p->parse($html, table => sub { ... }, tr => sub { ... });
DESCRIPTION
Parses HTML looking for table-related elements (table, tr, td and th as of version 0.1).
Three callbacks can be registered for each element. These callbacks, described below, are executed whenever an element of a particular type is encountered.
o start_${tagname} Called whenever $tagname is opened.
o ${tagname} Called immediately after start_${tagname}, and
immediately before end_${tagname}.
o end_${tagname} Called whenever a closing $tagname is encountered.
EXAMPLE
use HTML::TableExtractor;
$p = HTML::TableExtractor->new();
$p->parse($html,
start_table => sub {
my ($attr, $origtext) = @_;
print "Table border is $table->{border}\n";
},
tr => sub { print "Row opened or closed.\n" },
);
METHODS
- start($parser, $tag, $attr, $attrseq, $origtext);
-
Called whenever a particular start tag has been recognised. This module recognises these tags: <table>, <tr>, <td> & <th>.
This method will be called by the parser and is not intended to be called from an application.
- end($parser, $tag, $origtext);
-
Called whenever a particular end tag is encountered.
This method will be called by the parser and is not intended to be called from an application.
- $p->parse($html, tag_type => \&coderef, ...);
-
This method is all you really need to do. Call it with callbacks for each tag type. These will be executed as described above.
EXPORTS
CAVEATS, BUGS, and TODO
o parse() should handle other data sources, such as streaming, file handle etc.
SEE ALSO
HTML::Parser, HTML::TableContentParser
AUTHOR
Simon Drabble <simon@thebigmachine.org<gt>
(C) 2002 Simon Drabble
This software is released under the same terms as perl.