NAME

WWW::Crawler::Mojo::ScraperUtil - Scraper utitlities

SYNOPSIS

DESCRIPTION

This class inherits Mojo::UserAgent and override start method for storing user info

ATTRIBUTES

WWW::Crawler::Mojo::ScraperUtil implements following attributes.

METHODS

WWW::Crawler::Mojo::ScraperUtil implements following methods.

collect_urls_css

Collects URLs out of CSS.

@urls = collect_urls_css($dom);

decoded_body

Returns decoded response body for given Mojo::Message::Request using guess_encoding and encoder.

encoder

Generates Encode instance for given name. Defaults to Encode::utf8.

html_handlers

HTML element handler presets on scraping. Optional argument narrows the preset selector into certain containers.

my $handlers = html_handlers(['#header', '#footer li']);

$handlers->{img} = sub {
    my $dom = shift;
    return $dom->{src};
};

my @urls;
for my $selector (sort keys %{$handlers}) {
    $dom->find($selector)->each(sub {
        push(@urls, $handlers->{$selector}->(shift));
    })->to_array;
}

resolve_href

Resolves URLs with a base URL.

WWW::Crawler::Mojo::resolve_href($base, $uri);

guess_encoding

Guesses encoding of HTML or CSS with given Mojo::Message::Response instance.

$encode = WWW::Crawler::Mojo::guess_encoding($res) || 'utf-8'

AUTHOR

Keita Sugama, <sugama@jamadam.com>

COPYRIGHT AND LICENSE

Copyright (C) Keita Sugama.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.