Why not adopt me?
NAME
Yahoo::Search - Perl interface to Yahoo! Search's public API.
The following search spaces are supported:
- Doc
-
Common web search for documents (html, pdf, doc, ...)
- Image
-
Image search (jpeg, png, gif, ...)
- Video
-
Video file search (avi, mpeg, realmedia, ...)
- News
-
News article search
- Local
-
Yahoo! Local area (ZIP-code-based Yellow-Page like search)
- Spell
-
A pseudo-search to fetch a "did you mean?" spelling suggestion for a search term.
- Related
-
A pseudo-search to fetch "also try" related-searches for a search term.
(Note: what this Perl API calls "Doc" Search is what Yahoo! calls "Web" Search. But gee, aren't all web searches "Web" search, including Image/News/Video/etc?)
Yahoo!'s raw API, which this package uses, is described at:
http://developer.yahoo.net/
DOCS
The full documentation for this suite of classes is spread among these packages:
Yahoo::Search
Yahoo::Search::Request
Yahoo::Search::Response
Yahoo::Search::Result
However, you need use
only Yahoo::Search, which brings in the others as needed.
SYNOPSIS
Yahoo::Search provides a rich and full-featured set of classes for accessing the various features of Yahoo! Search, and also offers a variety of shortcuts to allow simple access, such as the following Doc search:
use Yahoo::Search;
my @Results = Yahoo::Search->Results(Doc => "Britney latest marriage",
AppId => "YahooDemo",
# The following args are optional.
# (Values shown are package defaults).
Mode => 'all',
Count => 10,
Start => 0,
Type => 'all',
AllowAdult => 0,
AllowSimilar => 0,
Language => undef,
);
warn $@ if $@; # report any errors
for my $Result (@Results)
{
printf "Result: #%d\n", $Result->I + 1,
printf "Url:%s\n", $Result->Url;
printf "%s\n", $Result->ClickUrl;
printf "Summary: %s\n", $Result->Summary;
printf "Title: %s\n", $Result->Title;
printf "In Cache: %s\n", $Result->CacheUrl;
print "\n";
}
The first argument to Results
indicates which search space is to be queried (in this case, Doc). The second argument is the search term or phrase (described in detail in the next section). Subsequent arguments are optional key/value pairs (described in detail in the section after that) -- the ones shown in the example are those allowed for a Doc query, with the values shown being the defaults.
Results
returns a list of Yahoo::Search::Result objects, one per item (in the case of a Doc search, an item is a web page, pdf document, doc document, etc.). The methods available to a Result
object are dependent upon the search space of the original query -- see Yahoo::Search::Result documentation for the complete list.
Search term / phrase
Within a search phrase ("Britney latest marriage
" in the example above), words that you wish to be included even if they would otherwise be eliminated as "too common" should be proceeded with a "+
". Words that you wish to exclude should be proceeded with a "-
". Words can be separated with "OR
" (the default for the any
Mode, described below), and can be wrapped in double quotes to identify an exact phrase (the default with the phrase
Mode, also described below).
There are also a number of "Search Meta Words", as described at http://help.yahoo.com/help/us/ysearch/basics/basics-04.html and http://help.yahoo.com/help/us/ysearch/tips/tips-03.html , which can stand along or be combined with Doc
searches (and, to some extent, some of the others -- YMMV):
- site:
-
allows one to find all documents within a particular domain and all its subdomains. Example: site:yahoo.com
- hostname:
-
allows one to find all documents from a particular host only. Example: hostname:autos.yahoo.comm
- link:
-
allows one to find documents that link to a particular url. Example: link:http://autos.yahoo.com/
- url:
-
allows one to find a specific document in Yahoo!'s index. Example: url:http://edit.autos.yahoo.com/repair/tree/0.html
- inurl:
-
allows one to find a specific keyword as part of indexed urls. Example: inurl:bulgarian
- intitle:
-
allows one to find a specific keyword as part of the indexed titles. Example: intitle:Bulgarian
As an example combining a number of different search styles, consider
my @Results = Yahoo::Search->Results(Doc => 'site:TheSmokingGun.com "Michael Jackson" -arrest',
AppId => "YahooDemo");
This returns data about pages at TheSmokingGun.com about Michael Jackson that don't contain the word "arrest" (yes, there are actually a few such pages).
Query arguments
As mentioned above, the arguments allowed in a Query
call depend upon the search space of the query. Here is a table of the possible arguments, showing which apply to queries of which search space:
Doc Image Video News Local Spell Related
----- ----- ----- ----- ----- ----- -------
AppId [X] [X] [X] [X] [X] [X] [X]
Mode [X] [X] [X] [X] [X] . .
Start [X] [X] [X] [X] [X] . .
Count [X] [X] [X] [X] [X] . .
AllowSimilar [X] . . . . . .
AllowAdult [X] [X] [X] . . . .
Type [X] [X] [X] . . . .
Sort . . . [X] . . .
Language [X] . . [X] . . .
Street . . . . [X] . .
City . . . . [X] . .
State . . . . [X] . .
PostalCode . . . . [X] . .
Location . . . . [X] . .
Radius . . . . [X] . .
AutoContinue [X] [X] [X] [X] [X] [X] [X]
Debug [X] [X] [X] [X] [X] [X] [X]
PreRequestCallback [X] [X] [X] [X] [X] [X] [X]
Here are details of each:
- AppId
-
A 8-40 character string which identifies the application making use of the Yahoo! Search API. (Think of it along the lines of an HTTP User-Agent string.)
The characters allowed are space, plus
A-Za-z0-9_()[]*+-=,.:@\
This argument is required of all searches (sorry). You can make up whatever AppId you'd like, but you are encouraged to register it via the link on
http://developer.yahoo.net/
especially if you are creating something that will be widely distributed.
As mentioned below in Defaults and Default Overrides, it's particularly convenient to get the
AppId
out of the way by putting it on theuse
line, e.g.use Yahoo::Search AppId => 'just testing';
It then applies to all queries unless explicitly overridden.
- Mode
-
Must be one of:
all
(the default),any
, orphrase
. Indicates how multiple words in the search term are used: search for documents with all words, documents with any words, or documents that contain the search term as an exact phrase. - Start
-
Indicates the ordinal of the first result to be returned, e.g. the "30" of "showing results 30-40" (except that
Start
is zero-based, not one-based). The default is zero, meaning that the primary results will be returned. - Count
-
Indicates how many items should be returned. The default is 10. The maximum allowed depends on the search space being queried: 20 for
Local
searches, and 50 for others which support theCount
argument.Note that
Yahoo::Search::MaxCount($SearchSpace)
and
$SearchEngine->MaxCount($SearchSpace)
return the maximum count allowed for the given
$SearchSpace
. - AllowSimilar
-
If this boolean is true (the default is false), similar results which would otherwise not be returned are included in the result set.
- AllowAdult
-
If this boolean is false (the default), results considered to be "adult" (i.e. porn) are not included in the result set. Set to true to allow unfiltered results.
Standard precautions apply about how the "is adult?" determination is not perfect.
- Type
-
This argument can be used to restrict the results to only a specific file type. The default value,
all
, allows any type (associated with the search space) to be returned. Otherwise, the values allowed depend on the search space:Search space Allowed Type values ============ ======================================================== Doc all html msword pdf ppt rss txt xls Image all bmp gif jpeg png Video all avi flash mpeg msmedia quicktime realmedia News N/A Local N/A Spell N/A Related N/A
- Sort
-
For News searches, the sort may be
rank
(the default) ordate
. - Language
-
If provided, restricts the results to documents in the given language. The value is an language code such as
en
(English),ja
(Japanese), etc (mostly ISO 639-1 codes). These are the codes supported:code language ---- --------- sq Albanian ar Arabic bg Bulgarian ca Catalan szh Chinese (simplified) tzh Chinese (traditional) hr Croatian cs Czech da Danish nl Dutch en English et Estonian fi Finnish fr French de German el Greek he Hebrew hu Hungarian is Icelandic it Italian ja Japanese ko Korean lv Latvian lt Lithuanian no Norwegian fa Persian pl Polish pt Portuguese ro Romanian ru Russian sk Slovak sl Slovenian es Spanish sv Swedish th Thai tr Turkish
In addition, the code "default" is the same as the lack of a language specifier, and seems to mean a mix of major world languages, skewed toward English.
- Street
- City
- State
- PostalCode
- Location
-
These items are for a Local query, and specify the epicenter of the search. The epicenter must be provided in one of a variety of ways: via the free-text
Location
, viaStreet
+PostalCode
, viaStreet
+City
+State
, viaPostalCode
alone, or viaCity
+State
alone.Street
is the street address, e.e. "701 First Ave".PostalCode
is a US 5-digit or 9-digit ZIP code (e.g. "94089" or "94089-1234").If
Location
is provided, it supersedes the others. It should be a string along the lines of "701 First Ave, Sunnyvale CA, 94089". The following forms are recognized:city state city state zip zip street, city state street, city state zip street, zip
Searches that include a street address (either in the
Location
, or ifLocation
is empty, inStreet
) provide for a more detailed epicenter specification. - Radius
-
For Local searches, indicates how wide an area around the epicenter to search. The value is the radius of the search area, in miles. The default radius depends on the search location (urban areas tend to have a smaller default radius).
- Debug
-
Debug
is a string (defaults to an empty string). If the substring "url
" is found anywhere in the string, the url of the Yahoo! request is printed on stderr. If "xml
", the raw xml received is printed to stderr. If "hash
", the raw Perl hash, as converted from the XML, is Data::Dump'd to stderr.Thus, to print all debugging, you'd set
Debug
to a value such as "url xml hash
". - AutoContinue
-
A boolean (default off). If true, turns on the potentially dangerous auto-continuation, as described in the docs for
NextResult
in Yahoo::Search::Response.
Class Hierarchy Details
The Y! Search API class system supports the following objects (all loaded as needed via Yahoo::Search):
Yahoo::Search
Yahoo::Search::Request
Yahoo::Search::Response
Yahoo::Search::Result
Here is a summary of them:
- Yahoo::Search
-
A "search engine" object which can hold user-specified default values for search-query arguments. Often not used explicitly.
- Yahoo::Search::Request
-
An object which holds the information needed to make one search-query request. Often not used explicitly.
- Yahoo::Search::Response
-
An object which holds the results of a query (including a bunch of
Result
objects). - Yahoo::Search::Result
-
An object representing one query result (one image, web page, etc., as appropriate to the original search space).
"The Long Way", and Common Practice
The explicit way to perform a query and access the results is to first create a "Search Engine" object:
my $SearchEngine = Yahoo::Search->new();
Optionally, you can provide new
with key/value pairs as described in the Query arguments section above. Those values will then be available as default values during subsequent request creation. (More on this later.)
You then use the search-engine object to create a request:
my $Request = $SearchEngine->Request(Doc => Britney);
You then actually make the request, getting a response:
my $Response = $Request->Fetch();
You can then access the set of Result
objects in a number of ways, either all at once
my @Results = $Response->Results();
or iteratively:
while (my $Result = $Response->NextResult) {
:
:
}
In Practice....
In practice, one often does not need to go through all these steps explicitly. The only reason to create a search-engine object, for example, is to hold default overrides (to be made available to subsequent requests made via the search-engine object). For example:
use Yahoo::Search;
my $SearchEngine = Yahoo::Search->new(AppId => "Bobs Fish Mart",
Count => 25,
AllowAdult => 1,
PostalCode => 95014);
Now, calls to the various query functions (Query
, Results
) via this $SearchEngine
will use these defaults (Image searches, for example, will be with AllowAdult
set to true, and Local searches will be centered at ZIP code 95014.) All will return up to 25 results.
In this example:
my @Results = $SearchEngine->Results(Image => "Britney",
Count => 20);
The query is made with AppId
as 'Bobs_Fish_Mart
' and AllowAdult
true (both via $SearchEngine
), but Count
is 20 because explicit args override the default in $SearchEngine
. The PostalCode
arg does not apply too an Image search, so the default provided from SearchEngine
is not needed with this particular query.
Defaults on the 'use' line
You can also provide the same defaults on the use
line. The following example has the same result as the previous one:
use Yahoo::Search AppId => 'Bobs Fish Mart',
Count => 25,
AllowAdult => 1,
PostalCode => 95014;
my @Results = Yahoo::Search->Results(Image => "Britney",
Count => 20);
Functions and Methods
Here, finally, are the functions and methods provided by Yahoo::Search. In all cases, "...args..." are any of the key/value pairs listed in the Query arguments section of this document (e.g. "Count => 20")
- $SearchEngine = Yahoo::Search->new(...args...)
-
Creates a search-engine object (a container for defaults). On error, sets
$@
and returns nothing. - $Request = $SearchEngine->Request($space => $query, ...args...)
- $Request = Yahoo::Search->Request($space => $query, ...args...)
-
Creates a
Request
object representing a search of the named search space (Doc, Image, etc.) of the given query string.On error, sets
$@
and returns nothing.Note: all arguments are in key/value pairs, but the
$space
/$query
pair (which is required) is required to appear first. - $Response = $SearchEngine->Query($space => $query, ...args...)
- $Response = Yahoo::Search->Query($space => $query, ...args...)
-
Creates an implicit
Request
object, and fetches it, returning the resultingResponse
.On error, sets
$@
and returns nothing.Note: all arguments are in key/value pairs, but the
$space
/$query
pair (which is required) is required to appear first. - @Results = $SearchEngine->Results($space => $query, ...args...)
- @Results = Yahoo::Search->Results($space => $query, ...args...)
-
Creates an implicit
Request
object, thenResponse
object, in the end returning a list ofResult
objects.On error, sets
$@
and returns nothing.Note: all arguments are in key/value pairs, but the
$space
/$query
pair (which is required) is required to appear first. - @links = $SearchEngine->Links($space => $query, ...args...)
- @links = Yahoo::Search->Links($space => $query, ...args...)
-
A super shortcut which goes directly from the query args to a list of
<a href=...>...</a>
links. Essentially,
map { $_->Link } Yahoo::Search->Results($space => $query, ...args...);
or, more explicitly:
map { $_->Link } Yahoo::Search->new()->Request($space => $query, ...args...)->Fetch->Results(@_);
See
Link
in the documentation for Yahoo::Search::Result.Note: all arguments are in key/value pairs, but the
$space
/$query
pair (which is required) is required to appear first. - @links = $SearchEngine->Terms($space => $query, ...args...)
- @links = Yahoo::Search->Terms($space => $query, ...args...)
-
A super shortcut for Spell and Related search spaces, returns the list of spelling-or related-search suggestions, respectively.
Note: all arguments are in key/value pairs, but the
$space
/$query
pair (which is required) is required to appear first. - @html = $SearchEngine->HtmlResults($space => $query, ...args...)
- @html = Yahoo::Search->HtmlResults($space => $query, ...args...)
-
Like
Links
, but returns a list of html strings (one representing each result). Seeas_html
in the documentation for Yahoo::Search::Result.A simple result display might look like
print join "<p>", Yahoo::Search->HtmlResults(....);
or, perhaps
if (my @HTML = Yahoo::Search->HtmlResults(....)) { print "<ul>"; for my $html (@HTML) { print "<li>", $html; } print "</ul>"; }
As an example, here's a complete CGI which shows results from an image-search, where the search term is in the '
s
' query string:#!/usr/local/bin/perl -w use CGI; my $cgi = new CGI; print $cgi->header(); use Yahoo::Search AppId => 'my-search-app'; if (my $term = $cgi->param('s')) { print join "<p>", Yahoo::Search->HtmlResults(Image => $term); }
The results, however, do look better with some style-sheet attention, such as:
<style> .yResult { display: block; border: #CCF 3px solid ; padding:10px } .yLink { } .yTitle { display:none } .yImg { border: solid 1px } .yUrl { display:none } .yMeta { font-size: 80% } .ySrcUrl { } .ySum { font-family: arial; font-size: 90% } </style>
Note: all arguments are in key/value pairs, but the
$space
/$query
pair (which is required) is required to appear first. - $num = $SearchEngine->MaxCount($space)
- $num = Yahoo::Search->MaxCount($space)
-
Returns the maximum allowed
Count
query-argument for the given search space. - $SearchEngine->Default($key [ => $val ]);
-
If a new value is given, update the <$SearchEngine>'s value for the named
$key
.In either case, the old value for
$key
in effect is returned. If the$SearchEngine
had a previous value, it is returned. Otherwise, the global value in effect is returned.As always, the key is from among those mentioned in the Query arguments section above.
The old value is returned.
- Yahoo::Search->Default($key [ => $val ]);
-
Update or, if no new value is given, check the global default value for the named argument. The key is from among those mentioned in the Query examples section above, as well as
AutoCarp
(discussed below).
Defaults and Default Overrides
All key/value pairs mentioned in the Query arguments section may appear on the use
line, in the call to the new
constructor, or in requests that create a query explicitly or implicitly (Request
, Query
, Results
, Links
, or HtmlResults
).
Each argument's value takes the first of the following which applies (listed in order of precedence):
- 4)
-
The actual arguments to a function which creates (explicitly or implicitly) a request.
- 3)
-
Search-engine default overrides, set when the Yahoo::Search
new
constructor is used to create a search-engine object, or when that object'sDefault
method is called. - 2)
-
Global default overrides, set on the
use
line or viaYahoo::Search->Default()
- 1)
-
Defaults hard-coded into these packages (e.g.
Count
defaults to 10).
It's particularly convenient to put the AppId
on the use
line, e.g.
use Yahoo::Search AppId => 'just testing';
AutoCarp
By default, detected errors that would be classified as programming errors (e.g. use of incorrect args) are automatically spit out to stderr besides being returned via $@
. This can be turned off via
use Yahoo::Search AutoCarp => 0;
or
Yahoo::Search->Default(AutoCarp => 0);
The default of true is somewhat obnoxious, but hopefully helps create better programs by forcing the programmer to actively think about error checking (if even long enough to turn off error reporting).
Copyright
Copyright (C) 2005 Yahoo! Inc.
Author
Jeffrey Friedl (jfriedl@yahoo.com)
$Id: Search.pm 2 2005-01-28 04:27:46Z jfriedl $