NAME

WWW::Scraper::Sherlock - Scrapes search engines via Sherlock plugins.

SYNOPSIS

require WWW::Scraper;
$search = new WWW::Scraper('Sherlock');
$search->sherlockPlugin($pluginURI);

# then proceed as any normal WWW::Search module.
$result = $search->next_result();

# The result objects include additional methods specifically for Sherlock.
$result->name();
$result->url();
$result->relevance();
$result->price();
$result->avail();
$result->email();
$result->detail();
$result->banner();
$result->browserResultType();    

# Attributes of the <SEARCH> and <BROWSER> blocks of the plugin
#  can be accessed via a hash in the object named 'sherlockSearchParam'.
$search->{'sherlockSearchParam'}{'name'}  # name
   . . . {...}{'description'}             # description
   . . . {...}{'method'}                  # method
   . . . {...}{'action'}                  # action
   . . . {...}{'routeType'}               # routeType
   . . . {...}{'update'}                  # update
   . . . {...}{'updateCheckDays'}         # updateCheckDays

DESCRIPTION

Performs WWW::Scraper-style searches on search engines, given a Sherlock plugin to define the request and response (as defined in http://developer.apple.com/technotes/tn/tn1141.html and enhanced by http://www.mozilla.org/projects/search/technical.html).

The plugin is named by a URI, such as "file:yahoo.src" or "http://sherlock.mozdev.org/yahoo.src".

This version does not automatically update plugins; it ignores the 'update' and 'updateCheckDays' attributes of the <SEARCH> block.

Getchur plugins red-hot from http://sherlock.mozdev.org/source/browse/sherlock/www/.

Also ignored in this version are the <INTERPRET> attributes of 'skipLocal' (partially implemented), 'charset', 'resultEncoding', 'resultTranslationEncoding' and 'resultTranslation'.

OPTIONS

$search->sherlockPlugin(pluginURI, { 'option' => $value });

You may supply any of the options available to WWW::Scraper objects (which are, in turn, WWW::Search objects). Options may also be passed to new Sherlock object via the sherlockPlugin() method, just as they would be in WWW::Search's next_result(). New Sherlock options include

noUpdate - boolean, do not fetch an updated plugin, even if that is called for by updateCheckDays.

EXAMPLE

This sample is a complete script that runs Sherlock against Yahoo.com. The query is "Greeting Cards". It lists all the harvested fields to STDOUT. Note that WWW::Scraper('Sherlock') loads WWW::Scraper::Sherlock, so you don't have to.

 use WWW::Scraper;
 
 my $scraper = new WWW::Scraper('Sherlock');
 $scraper->sherlockPlugin('http://sherlock.mozdev.org/yahoo.src'); # or 'file:Sherlock/yahoo.src';

 $scraper->native_query('Greeting Cards', {'search_debug' => 1});

 while ( my $result = $scraper->next_result() ) {
     print "NAME: '".$result->name()."'\n";
     print "URL: '".$result->url()."'\n";
     print "RELEVANCE: '".$result->relevance()."'\n";
     print "PRICE: '".$result->price()."'\n";
     print "AVAIL: '".$result->avail()."'\n";
     print "EMAIL: '".$result->email()."'\n";
     print "DETAIL: '".$result->detail()."'\n";
 }

SEE ALSO

Apple's Introduction to Sherlock plugin development

http://www.apple.com/sherlock/plugindev.html

Sherlock Specification Technote TN1141

http://developer.apple.com/technotes/tn/tn1141.html

Mozilla Enhancements

http://www.mozilla.org/projects/search/technical.html

Mozdev Plugins Library

http://sherlock.mozdev.org/source/browse/sherlock/www/

AUTHOR

WWW::Scraper::Sherlock is written and maintained by Glenn Wood, glenwood@alumni.caltech.com.

COPYRIGHT

Copyright (c) 2001 Glenn Wood All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.