NAME
WWW::Crawler::Mojo::ScraperUtil - Scraper utitlities
SYNOPSIS
DESCRIPTION
This class inherits Mojo::UserAgent and override start method for storing user info
ATTRIBUTES
WWW::Crawler::Mojo::ScraperUtil implements following attributes.
METHODS
WWW::Crawler::Mojo::ScraperUtil implements following methods.
collect_urls_css
Collects URLs out of CSS.
@urls = collect_urls_css($dom);
decoded_body
Returns decoded response body for given Mojo::Message::Request using guess_encoding and encoder.
encoder
Generates Encode instance for given name. Defaults to Encode::utf8.
html_handlers
HTML element handler presets on scraping. Optional argument narrows the preset selector into certain containers.
my $handlers = html_handlers(['#header', '#footer li']);
$handlers->{img} = sub {
my $dom = shift;
return $dom->{src};
};
my @urls;
for my $selector (sort keys %{$handlers}) {
$dom->find($selector)->each(sub {
push(@urls, $handlers->{$selector}->(shift));
})->to_array;
}
resolve_href
Resolves URLs with a base URL.
WWW::Crawler::Mojo::resolve_href($base, $uri);
guess_encoding
Guesses encoding of HTML or CSS with given Mojo::Message::Response instance.
$encode = WWW::Crawler::Mojo::guess_encoding($res) || 'utf-8'
AUTHOR
Keita Sugama, <sugama@jamadam.com>
COPYRIGHT AND LICENSE
Copyright (C) Keita Sugama.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.