NAME

Dezi::Bot - web crawler

SYNOPSIS

use Dezi::Bot;

my $bot = Dezi::Bot->new(

   # give your bot a name
   name => 'dezibot',  
   
   # explicit object, instead of class+config
   spider => $spider_object,  
    
   # every crawled URI
   # passed to the $handler->handle() method
   handler_class => 'Dezi::Bot::Handler',
   
   # default
   spider_class => 'Dezi::Bot::Spider',
   
   # passed to spider_class->new()
   spider_config   => {
       agent      => 'dezibot ' . $Dezi::Bot::VERSION,
       email      => 'bot@dezi.org',
       max_depth  => 4,
   },
   
   # default
   cache_class => 'Dezi::Bot::Cache',
   
   # passed to cache_class->new()
   cache_config => {
       driver      => 'File',
       root_dir    => '/tmp/dezibot',
   },
   
   # default
   queue_class => 'Dezi::Bot::Queue',
   
   # passed to queue_class->new()
   queue_config => {
       type     => 'DBI',
       dsn      => "DBI:mysql:database=dezibot;host=localhost;port=3306",
       username => 'myuser',
       password => 'mysecret',
   },
);

$bot->crawl('http://dezi.org');

DESCRIPTION

The Dezi::Bot module is a web crawler optimized for parallel use across multiple hosts.

METHODS

init( args )

Overrides the base method to set default options based on args. See the SYNOPSIS.

Options:

name
spider
handler_class
handler_config
spider_class
spider_config
cache_class
cache_config
queue_class
queue_config

crawl( urls )

Calls ->spider->crawl() for an array of urls.

Returns the total number of URIs crawled.

AUTHOR

Peter Karman, <karman at cpan.org>

BUGS

Please report any bugs or feature requests to bug-dezi-bot at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Dezi-Bot. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Dezi::Bot

You can also look for information at:

COPYRIGHT & LICENSE

Copyright 2013 Peter Karman.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.