NAME
Dezi::Bot - web crawler
SYNOPSIS
use Dezi::Bot;
my $bot = Dezi::Bot->new(
# give your bot a name
name => 'dezibot',
# explicit object, instead of class+config
spider => $spider_object,
# every crawled URI
# passed to the $handler->handle() method
handler_class => 'Dezi::Bot::Handler',
# default
spider_class => 'Dezi::Bot::Spider',
# passed to spider_class->new()
spider_config => {
agent => 'dezibot ' . $Dezi::Bot::VERSION,
email => 'bot@dezi.org',
max_depth => 4,
},
# default
cache_class => 'Dezi::Bot::Cache',
# passed to cache_class->new()
cache_config => {
driver => 'File',
root_dir => '/tmp/dezibot',
},
# default
queue_class => 'Dezi::Bot::Queue',
# passed to queue_class->new()
queue_config => {
type => 'DBI',
dsn => "DBI:mysql:database=dezibot;host=localhost;port=3306",
username => 'myuser',
password => 'mysecret',
},
);
$bot->crawl('http://dezi.org');
DESCRIPTION
The Dezi::Bot module is a web crawler optimized for parallel use across multiple hosts.
METHODS
init( args )
Overrides the base method to set default options based on args. See the SYNOPSIS.
Options:
- name
- spider
- handler_class
- handler_config
- spider_class
- spider_config
- cache_class
- cache_config
- queue_class
- queue_config
crawl( urls )
Calls ->spider->crawl() for an array of urls.
Returns the total number of URIs crawled.
AUTHOR
Peter Karman, <karman at cpan.org>
BUGS
Please report any bugs or feature requests to bug-dezi-bot at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Dezi-Bot. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Dezi::Bot
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
COPYRIGHT & LICENSE
Copyright 2013 Peter Karman.
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
See http://dev.perl.org/licenses/ for more information.