NAME
HTML::TableContentParser - Do interesting things with the contents of tables.
SYNOPSIS
use HTML::TableContentParser;
$p = HTML::TableContentParser->new();
$tables = $p->parse($html);
DESCRIPTION
This package pulls out the contents of a table from a string containing HTML. Each time a table is encountered, data will be stored in an array consisting of a hash of whatever was discovered about the table -- id, name, border, cellspacing etc, and of course data contained within the table.
The format of each hash will look something like
attributes keys from the attributes of the <table> tag
@{$table_headers} array of table headers, in order found
@{$table_rows} rows discovered, in order
then for each table row, @{$table_data} td's found, in order other attributes the ... in <tr ...>
then for each data cell, data what comes between <td> and </td> other attributes the ... in <td ...>
EXAMPLE
use HTML::TableContentParser;
$p = HTML::TableContentParser->new();
$html = read_html_from_somewhere();
$tables = $p->parse($html);
for $t (@$tables) {
for $r (@{$t->{rows}}) {
print "Row: ";
for $c (@{$r->{cells}}) {
print "[$c->{data}] ";
}
print "\n";
}
}
METHODS
- start($parser, $tag, $attr, $attrseq, $origtext);
-
Called whenever a particular start tag has been recognised. This is called automatically by the parser and should not be called from the application.
- text($parser, $content);
-
Called whenever a piece of content is encountered. This is called automatically by the parser and should not be called from the application.
- end($parser, $tag, $origtext);
-
Called whenever a particular end tag is encountered. This is called automatically by the parser and should not be called from the application.
- $tables_ref = $p->parse($html);
-
Called with the HTML to parse. This is all the application needs to do. The return value will be an arrayref containing each table encountered, in the format detailed above.
- DEBUG
-
Not a method, but a class variable. Set to 1 to cause debugging output (basically the structure and content of the table) to be sent to stdout via warn().
EXPORTS
Nothing.
CAVEATS, BUGS, and TODO
AUTHOR
Simon Drabble E<lt>sdrabble@cpan.orgE<gt>
(C) 2002 Simon Drabble
This software is released under the same terms as perl.