NAME
WWW::Scraper::Sherlock - Scrapes search engines via Sherlock plugins.
SYNOPSIS
require WWW::Scraper;
$search = new WWW::Scraper('Sherlock');
$search->sherlockPlugin($pluginURI);
# then proceed as any normal WWW::Search module.
$result = $search->next_result();
# The result objects include additional methods specifically for Sherlock.
$result->name();
$result->url();
$result->relevance();
$result->price();
$result->avail();
$result->email();
$result->detail();
$result->banner();
$result->browserResultType();
# Attributes of the <SEARCH> and <BROWSER> blocks of the plugin
# can be accessed via a hash in the object named 'sherlockSearchParam'.
$search->{'sherlockSearchParam'}{'name'} # name
. . . {...}{'description'} # description
. . . {...}{'method'} # method
. . . {...}{'action'} # action
. . . {...}{'routeType'} # routeType
. . . {...}{'update'} # update
. . . {...}{'updateCheckDays'} # updateCheckDays
DESCRIPTION
Performs WWW::Scraper-style searches on search engines, given a Sherlock plugin to define the request and response (as defined in http://developer.apple.com/technotes/tn/tn1141.html and enhanced by http://www.mozilla.org/projects/search/technical.html).
The plugin is named by a URI, such as "file:yahoo.src" or "http://sherlock.mozdev.org/yahoo.src".
This version does not automatically update plugins; it ignores the 'update' and 'updateCheckDays' attributes of the <SEARCH> block.
Getchur plugins red-hot from http://sherlock.mozdev.org/source/browse/sherlock/www/.
Also ignored in this version are the <INTERPRET> attributes of 'skipLocal' (partially implemented), 'charset', 'resultEncoding', 'resultTranslationEncoding' and 'resultTranslation'.
OPTIONS
$search->sherlockPlugin(pluginURI, { 'option' => $value });
You may supply any of the options available to WWW::Scraper objects (which are, in turn, WWW::Search objects). Options may also be passed to new Sherlock object via the sherlockPlugin()
method, just as they would be in WWW::Search's next_result()
. New Sherlock options include
noUpdate - boolean, do not fetch an updated plugin, even if that is called for by updateCheckDays.
EXAMPLE
This sample is a complete script that runs Sherlock against Yahoo.com. The query is "Greeting Cards". It lists all the harvested fields to STDOUT. Note that WWW::Scraper('Sherlock') loads WWW::Scraper::Sherlock, so you don't have to.
use WWW::Scraper;
my $scraper = new WWW::Scraper('Sherlock');
$scraper->sherlockPlugin('http://sherlock.mozdev.org/yahoo.src'); # or 'file:Sherlock/yahoo.src';
$scraper->native_query('Greeting Cards', {'search_debug' => 1});
while ( my $result = $scraper->next_result() ) {
print "NAME: '".$result->name()."'\n";
print "URL: '".$result->url()."'\n";
print "RELEVANCE: '".$result->relevance()."'\n";
print "PRICE: '".$result->price()."'\n";
print "AVAIL: '".$result->avail()."'\n";
print "EMAIL: '".$result->email()."'\n";
print "DETAIL: '".$result->detail()."'\n";
}
SEE ALSO
- Apple's Introduction to Sherlock plugin development
-
http://www.apple.com/sherlock/plugindev.html
- Sherlock Specification Technote TN1141
-
http://developer.apple.com/technotes/tn/tn1141.html
- Mozilla Enhancements
-
http://www.mozilla.org/projects/search/technical.html
- Mozdev Plugins Library
-
http://sherlock.mozdev.org/source/browse/sherlock/www/
AUTHOR
WWW::Scraper::Sherlock
is written and maintained by Glenn Wood, glenwood@alumni.caltech.com.
COPYRIGHT
Copyright (c) 2001 Glenn Wood All rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.