NAME
Fabnewsru::Utils - Some useful methods for operating with Mojo::DOM objects
VERSION
version 0.01
SYNOPSIS
use Fabnews::Utils qw(table2hash table2array_of_hashes merge_hashes);
my $dom = Mojo::DOM->new('<div class="company-profile-table"><tr><td>key1</td><td>val1</td></tr></div>');
warn Dumper table2hash($dom, ".company-profile-table"); # { key1 => 'val1' }
my $h = table2hash($url, $table_container);
my $h = table2hash("http://fabnews.ru/fablabs/item/ufo/", ".company-profile-table"); # .company-profile-table - container with <table> that is needed to be parsed
my $arr = table2array_of_hashes("http://fabnews.ru/fablabs/", "table", ["name", "fabnews_subscribers", "fabnews_rating"]);
METHODS
table2hash
Accepts as input Mojo::DOM object
Convert table to hash. Each row will be represented as key - value pair
Key will be text at first <td> element, value - at second <td>
Example
Table
header1 | header2 ---------------- key1 | value1 key2 | value2
will be processed into a hash
{ key1 => value1, key2 => value2}
Assuming that strigs in $dom it already in internal format and with UTF8 flag set
table2array_of_hashes
my $arr = table2array_of_hashes($container, $fields_arr);
$res = table2array_of_hashes($dom, ".company-profile-table", ["name", "fabnews_subscribers", "fabnews_rating"]); $res = table2array_of_hashes($dom, ".company-profile-table");
Convert table to list of hashes.
You can pass at $fields_arr how will be hash keys called.
Otherwise (if no array provided) hash keys will be take n from <th> tag of <thead>
Example
Table
header1 | header2 ---------------- key1 | value1 key2 | value2
will be processed into a hash
[ { header1 => key1, header2 => value1 }, { header1 => key2, header2 => val2 } ]
Also if there will be any urls in table cells it will create a hash key with array val
E.g.
header1 | header2 ---------------- key1 | value1 key2 + url | value2
Result will be like
[ { header1 => key1, header2 => value1 }, { header1 => key2, header2 => val2, urls => [] } ]
merge_hashes
Intellectual merge of two hashes
Return new hash with keys from first hash ($fields) and values from second hash ($values)
All input hashes must be in Perl internal encoding
Useful when substitution of hash keys containing some non-ASCII characters with ASCII-only latin characters which are more universal
See unit tests for more examples
rm_spec_symbols_from_string
Set of regular expressions which are deleting typical unwanted symbols from string:
* [\$#@~!&;:] characters * any number of whitespaces in the beginning of string * any number of whitespaces in the end of string * replace a lot of space symbols into one space
This function is useful when post-processing HTML parsing results (in fact not all results looks good without post-processing)
AUTHOR
Pavel Serikov <pavelsr@cpan.org>
COPYRIGHT AND LICENSE
This software is copyright (c) 2016 by Pavel Serikov.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.