Why not adopt me?
NAME
Yahoo::Search - Perl interface to the Yahoo! Search public API.
The following search spaces are supported:
- Doc
-
Common web search for documents (html, pdf, doc, ...), including Y!Q contextual search.
- Image
-
Image search (jpeg, png, gif, ...)
- Video
-
Video file search (avi, mpeg, realmedia, ...)
- News
-
News article search
- Local
-
Yahoo! Local area (ZIP-code-based Yellow-Page like search)
- Spell
-
A pseudo-search to fetch a "did you mean?" spelling suggestion for a search term.
- Related
-
A pseudo-search to fetch "also try" related-searches for a search term.
(Note: what this Perl API calls "Doc" Search is what Yahoo! calls "Web" Search. But gee, aren't all web searches "Web" search, including Image/News/Video/etc?)
Yahoo!'s raw API, which this package uses, is described at:
http://developer.yahoo.net/
DOCS
The full documentation for this suite of classes is spread among these packages:
Yahoo::Search
Yahoo::Search::Request
Yahoo::Search::Response
Yahoo::Search::Result
However, you need use
only Yahoo::Search, which brings in the others as needed.
SYNOPSIS
Yahoo::Search provides a rich and full-featured set of classes for accessing the various features of Yahoo! Search, and also offers a variety of shortcuts to allow simple access, such as the following Doc search:
use Yahoo::Search;
my @Results = Yahoo::Search->Results(Doc => "Britney latest marriage",
AppId => "YahooDemo",
# The following args are optional.
# (Values shown are package defaults).
Mode => 'all', # all words
Start => 0,
Count => 10,
Type => 'any', # all types
AllowAdult => 0, # no porn, please
AllowSimilar => 0, # no dups, please
Language => undef,
);
warn $@ if $@; # report any errors
for my $Result (@Results)
{
printf "Result: #%d\n", $Result->I + 1,
printf "Url:%s\n", $Result->Url;
printf "%s\n", $Result->ClickUrl;
printf "Summary: %s\n", $Result->Summary;
printf "Title: %s\n", $Result->Title;
printf "In Cache: %s\n", $Result->CacheUrl;
print "\n";
}
The first argument to Results
indicates which search space is to be queried (in this case, Doc). The second argument is the search term or phrase (described in detail in the next section). Subsequent arguments are optional key/value pairs (described in detail in the section after that) -- the ones shown in the example are those allowed for a Doc query, with the values shown being the defaults.
Results
returns a list of Yahoo::Search::Result objects, one per item (in the case of a Doc search, an item is a web page, pdf document, doc document, etc.). The methods available to a Result
object are dependent upon the search space of the original query -- see Yahoo::Search::Result documentation for the complete list.
Search term / phrase
Within a search phrase ("Britney latest marriage
" in the example above), words that you wish to be included even if they would otherwise be eliminated as "too common" should be proceeded with a "+
". Words that you wish to exclude should be proceeded with a "-
". Words can be separated with "OR
" (the default for the any
Mode, described below), and can be wrapped in double quotes to identify an exact phrase (the default with the phrase
Mode, also described below).
There are also a number of "Search Meta Words", as described at http://help.yahoo.com/help/us/ysearch/basics/basics-04.html and http://help.yahoo.com/help/us/ysearch/tips/tips-03.html , which can stand along or be combined with Doc searches (and, to some extent, some of the others -- YMMV):
- site:
-
allows one to find all documents within a particular domain and all its subdomains. Example: site:yahoo.com
- hostname:
-
allows one to find all documents from a particular host only. Example: hostname:autos.yahoo.comm
- link:
-
allows one to find documents that link to a particular url. Example: link:http://autos.yahoo.com/
- url:
-
allows one to find a specific document in Yahoo!'s index. Example: url:http://edit.autos.yahoo.com/repair/tree/0.html
- inurl:
-
allows one to find a specific keyword as part of indexed urls. Example: inurl:bulgarian
- intitle:
-
allows one to find a specific keyword as part of the indexed titles. Example: intitle:Bulgarian
As an example combining a number of different search styles, consider
my @Results = Yahoo::Search->Results(Doc => 'site:TheSmokingGun.com "Michael Jackson" -arrest',
AppId => "YahooDemo");
This returns data about pages at TheSmokingGun.com about Michael Jackson that don't contain the word "arrest" (yes, there are actually a few such pages).
Query arguments
As mentioned above, the arguments allowed in a Query
call depend upon the search space of the query. Here is a table of the possible arguments, showing which apply to queries of which search space:
Doc Image Video News Local Spell Related
----- ----- ----- ----- ----- ----- -------
AppId [X] [X] [X] [X] [X] [X] [X]
Mode [X] [X] [X] [X] [X] . .
Start [X] [X] [X] [X] [X] . .
Count [X] [X] [X] [X] [X] . .
Context [X] . . . . . .
Country [X] . . . . . .
License [X] . . . . . .
AllowSimilar [X] . . . . . .
AllowAdult [X] [X] [X] . . . .
Type [X] [X] [X] . . . .
Language [X] . . [X] . . .
Sort . . . [X] [X] . .
Color . [X] . . . . .
Lat . . . . [X] . .
Long . . . . [X] . .
Street . . . . [X] . .
City . . . . [X] . .
State . . . . [X] . .
PostalCode . . . . [X] . .
Location . . . . [X] . .
Radius . . . . [X] . .
AutoContinue [X] [X] [X] [X] [X] [X] [X]
Debug [X] [X] [X] [X] [X] [X] [X]
PreRequestCallback [X] [X] [X] [X] [X] [X] [X]
Here are details of each:
- AppId
-
A 8-40 character string which identifies the application making use of the Yahoo! Search API. (Think of it along the lines of an HTTP User-Agent string.)
The characters allowed are space, plus
A-Za-z0-9_()[]*+-=,.:@\
This argument is required of all searches (sorry). You can make up whatever AppId you'd like, but you are encouraged to register it via the link on
http://developer.yahoo.net/
especially if you are creating something that will be widely distributed.
As mentioned below in Defaults and Default Overrides, it's particularly convenient to get the
AppId
out of the way by putting it on theuse
line, e.g.use Yahoo::Search AppId => 'just testing';
It then applies to all queries unless explicitly overridden.
- Mode
-
Must be one of:
all
(the default),any
, orphrase
. Indicates how multiple words in the search term are used: search for documents with all words, documents with any words, or documents that contain the search term as an exact phrase. - Start
-
Indicates the ordinal of the first result to be returned, e.g. the "30" of "showing results 30-40" (except that
Start
is zero-based, not one-based). The default is zero, meaning that the primary results will be returned. - Count
-
Indicates how many items should be returned. The default is 10. The maximum allowed depends on the search space being queried: 20 for Local searches, and 50 for others which support the
Count
argument.Note that
Yahoo::Search::MaxCount($SearchSpace)
and
$SearchEngine->MaxCount($SearchSpace)
return the maximum count allowed for the given
$SearchSpace
. - Context
-
By providing a context string, you change the request from a normal document query to a Y!Q contextual query. Y!Q is described at
http://yq.search.yahoo.com/
The
Content
string can be raw text, html, etc., and is to provide the document search more information about what kind of results are wanted.For example, without a
Context
, a document search for "Madonna" returns the most popular documents (which are invariably about the famous pop singer). However, if you provide a context string even as simple as "Virgin Mary", the results skew away from the pop singer toward the Mother of God. Since it's likely that a confusion between the two would be less than optimal in pretty much every conceivable case, this is a Good Thing.When a
Context
is given, the query string itself may be empty. For example, if you have the text of a blog entry in$BlogText
, you can fetch "related links" via:use Yahoo::Search AppId => 'my blog stuff'; my @Results = Yahoo::Search->Results(Doc => undef, Context => $BlogText);
- Country
-
Attempts to restrict the Doc search to web servers residing in the named country. As of this writing, the Yahoo! web services support the following codes for
Country
:code country ---- --------------- ar Argentina au Australia at Austria be Belgium br Brazil ca Canada cn China cz Czech Republic dk Denmark fi Finland fr France de Germany it Italy jp Japan kr Korea nl Netherlands no Norway pl Poland rf Russian Federation es Spain se Sweden ch Switzerland tw Taiwan uk United Kingdom us United States
In addition, the code "default" is the same as the lack of a country specifier: no country-related restrictions.
The above list can be found in
%Yahoo::Search::KnownCountry
.Because the list of countries may be updated more often than this Perl API, this Perl API does not attempt to restrict the
Country
value to members of this specific list. If you provide aCountry
value which is not supported by Yahoo!'s web services, a "400 Bad Request" error is returned in@$
. - License
-
For
Doc
searches, can be:any
-
(the default) -- results are not filtered with respect to licenses
cc_any
-
Only items with a Creative Commons license (of any type) are returned. See their (horribly designed hard to find anything substantial) site at:
http://creativecommons.org/
cc_commercial
-
Only items with a Creative Commons license which allows some kind of commercial use are returned.
cc_modifiable
-
Only items with a Creative Commons license which allows modification (e.g. derived works) of some kind are returned.
You may combine the above to create an intersection, e.g.
License => "cc_commercial+cc_modifiable"
(space, comma, or plus-separated) returns items which allow both some kind of commercial use, and their use in some kinds of derivative works.
- AllowSimilar
-
If this boolean is true (the default is false), similar results which would otherwise not be returned are included in the result set.
- AllowAdult
-
If this boolean is false (the default), results considered to be "adult" (i.e. porn) are not included in the result set. Set to true to allow unfiltered results.
Standard precautions apply about how the "is adult?" determination is not perfect.
- Type
-
This argument can be used to restrict the results to only a specific file type. The default value,
any
, allows any type associated with the search space to be returned (that is, provides no restriction). Otherwise, the values allowed forType
depend on the search space:Search space Allowed Type values ============ ======================================================== Doc any html msword pdf ppt rss txt xls Image any bmp gif jpeg png Video any avi flash mpeg msmedia quicktime realmedia News N/A Local N/A Spell N/A Related N/A
(Deprecated: you may use
all
in place ofany
) - Language
-
If provided, attempts to restrict the results to documents in the given language. The value is an language code such as
en
(English),ja
(Japanese), etc (mostly ISO 639-1 codes). As of this writing, the following codes are supported:code language ---- --------- sq Albanian ar Arabic bg Bulgarian ca Catalan szh Chinese (simplified) tzh Chinese (traditional) hr Croatian cs Czech da Danish nl Dutch en English et Estonian fi Finnish fr French de German el Greek he Hebrew hu Hungarian is Icelandic it Italian ja Japanese ko Korean lv Latvian lt Lithuanian no Norwegian fa Persian pl Polish pt Portuguese ro Romanian ru Russian sk Slovak sl Slovenian es Spanish sv Swedish th Thai tr Turkish
In addition, the code "default" is the same as the lack of a language specifier, and seems to mean a mix of major world languages, skewed toward English.
The above list can be found in
%Yahoo::Search::KnownLanguage
.Because the list of languages may be updated more often than this Perl API, this Perl API does not attempt to restrict the
Language
value to members of this specific list. If you provide aLanguage
value which is not supported by Yahoo!'s web services, a "400 Bad Request" error is returned in@$
. - Sort
-
For News searches,
sort
may berank
(the default) ordate
.For Local searches,
sort
may berelevance
(the default; most relevant first),distance
(closest first),rating
(highest rating first), ortitle
(alphabetic sort). - Color
-
For Image searches, may be
any
(the default),color
, orbw
:any
-
No filtering based on colorization or lack thereof
color
-
Only images with color are returned
bw
-
Only black & white / grayscale images are returned
- Lat
- Long
- Street
- City
- State
- PostalCode
- Location
-
These items are for a Local query, and specify the epicenter of the search. The epicenter must be provided in one of a variety of ways:
via
Lat
andLong
via the free-text
Location
via
Street
andPostalCode
via
Street
andCity
andState
via
PostalCode
alonevia
City
andState
alone.
The list above is the order of precedence for when multiple fields are sent (e.g. if a
Lat
andLong
are sent, they are used regardless of whether, say, aPostalCode
is used), but it's probably best to send exactly only the fields you wish to be used.Lat
andLong
are floating point numbers, such as this example:Lat => 39.224079 # 39 deg 13 min 26.686 sec North Long => -98.541807, # 98 deg 32 min 30.506 sec West
(which happens to be the location of the "Medes Ranch" triangulation station, upon which all country, state, etc., boundaries in North America were originally based)
Street
is the street address, e.e. "701 First Ave".PostalCode
is a US 5-digit or 9-digit ZIP code (e.g. "94089" or "94089-1234").If
Location
is provided, it supersedes the others. It should be a string along the lines of "701 First Ave, Sunnyvale CA, 94089". The following forms are recognized:city state city state zip zip street, city state street, city state zip street, zip
Searches that include a street address (either in the
Location
, or ifLocation
is empty, inStreet
) provide for a more detailed epicenter specification. - Radius
-
For Local searches, indicates how wide an area around the epicenter to search. The value is the radius of the search area, in miles. The default radius depends on the search location (urban areas tend to have a smaller default radius).
- AutoContinue
-
A boolean (default off). If true, turns on the potentially dangerous auto-continuation, as described in the docs for
NextResult
in Yahoo::Search::Response. - Debug
-
Debug
is a string (defaults to an empty string). If the substring "url
" is found anywhere in the string, the url of the Yahoo! request is printed on stderr. If "xml
", the raw xml received is printed to stderr. If "hash
", the raw Perl hash, as converted from the XML, is Data::Dump'd to stderr.Thus, to print all debugging, you'd set
Debug
to a value such as "url xml hash
". - PreRequestCallback
-
This is for debugging (I needed it for my own regression-test script). If defined, it should be a code ref which accepts a single Yahoo::Search::Request object argument. It is called just before Yahoo!'s servers are contacted, and if it returns false, the call to Yahoo! is aborted (be sure to set
$@
).
Class Hierarchy Details
The Y! Search API class system supports the following objects (all loaded as needed via Yahoo::Search):
Yahoo::Search
Yahoo::Search::Request
Yahoo::Search::Response
Yahoo::Search::Result
Here is a summary of them:
- Yahoo::Search
-
A "search engine" object which can hold user-specified default values for search-query arguments. Often not used explicitly.
- Yahoo::Search::Request
-
An object which holds the information needed to make one search-query request. Often not used explicitly.
- Yahoo::Search::Response
-
An object which holds the results of a query (including a bunch of
Result
objects). - Yahoo::Search::Result
-
An object representing one query result (one image, web page, etc., as appropriate to the original search space).
"The Long Way", and Common Practice
The explicit way to perform a query and access the results is to first create a "Search Engine" object:
my $SearchEngine = Yahoo::Search->new();
Optionally, you can provide new
with key/value pairs as described in the Query arguments section above. Those values will then be available as default values during subsequent request creation. (More on this later.)
You then use the search-engine object to create a request:
my $Request = $SearchEngine->Request(Doc => Britney);
You then actually make the request, getting a response:
my $Response = $Request->Fetch();
You can then access the set of Result
objects in a number of ways, either all at once
my @Results = $Response->Results();
or iteratively:
while (my $Result = $Response->NextResult) {
:
:
}
In Practice....
In practice, one often does not need to go through all these steps explicitly. The only reason to create a search-engine object, for example, is to hold default overrides (to be made available to subsequent requests made via the search-engine object). For example:
use Yahoo::Search;
my $SearchEngine = Yahoo::Search->new(AppId => "Bobs Fish Mart",
Count => 25,
AllowAdult => 1,
PostalCode => 95014);
Now, calls to the various query functions (Query
, Results
) via this $SearchEngine
will use these defaults (Image searches, for example, will be with AllowAdult
set to true, and Local searches will be centered at ZIP code 95014.) All will return up to 25 results.
In this example:
my @Results = $SearchEngine->Results(Image => "Britney",
Count => 20);
The query is made with AppId
as 'Bobs_Fish_Mart
' and AllowAdult
true (both via $SearchEngine
), but Count
is 20 because explicit args override the default in $SearchEngine
. The PostalCode
arg does not apply too an Image search, so the default provided from SearchEngine
is not needed with this particular query.
Defaults on the 'use' line
You can also provide the same defaults on the use
line. The following example has the same result as the previous one:
use Yahoo::Search AppId => 'Bobs Fish Mart',
Count => 25,
AllowAdult => 1,
PostalCode => 95014;
my @Results = Yahoo::Search->Results(Image => "Britney",
Count => 20);
Functions and Methods
Here, finally, are the functions and methods provided by Yahoo::Search. In all cases, "...args..." are any of the key/value pairs listed in the Query arguments section of this document (e.g. "Count => 20")
- $SearchEngine = Yahoo::Search->new(...args...)
-
Creates a search-engine object (a container for defaults). On error, sets
$@
and returns nothing. - $Request = $SearchEngine->Request($space => $query, ...args...)
- $Request = Yahoo::Search->Request($space => $query, ...args...)
-
Creates a
Request
object representing a search of the named search space (Doc, Image, etc.) of the given query string.On error, sets
$@
and returns nothing.Note: all arguments are in key/value pairs, but the
$space
/$query
pair (which is required) is required to appear first. - $Response = $SearchEngine->Query($space => $query, ...args...)
- $Response = Yahoo::Search->Query($space => $query, ...args...)
-
Creates an implicit
Request
object, and fetches it, returning the resultingResponse
.On error, sets
$@
and returns nothing.Note: all arguments are in key/value pairs, but the
$space
/$query
pair (which is required) is required to appear first. - @Results = $SearchEngine->Results($space => $query, ...args...)
- @Results = Yahoo::Search->Results($space => $query, ...args...)
-
Creates an implicit
Request
object, thenResponse
object, in the end returning a list ofResult
objects.On error, sets
$@
and returns nothing.Note: all arguments are in key/value pairs, but the
$space
/$query
pair (which is required) is required to appear first. - @links = $SearchEngine->Links($space => $query, ...args...)
- @links = Yahoo::Search->Links($space => $query, ...args...)
-
A super shortcut which goes directly from the query args to a list of
<a href=...>...</a>
links. Essentially,
map { $_->Link } Yahoo::Search->Results($space => $query, ...args...);
or, more explicitly:
map { $_->Link } Yahoo::Search->new()->Request($space => $query, ...args...)->Fetch->Results(@_);
See
Link
in the documentation for Yahoo::Search::Result.Note: all arguments are in key/value pairs, but the
$space
/$query
pair (which is required) is required to appear first. - @links = $SearchEngine->Terms($space => $query, ...args...)
- @links = Yahoo::Search->Terms($space => $query, ...args...)
-
A super shortcut for Spell and Related search spaces, returns the list of spelling-or related-search suggestions, respectively.
Note: all arguments are in key/value pairs, but the
$space
/$query
pair (which is required) is required to appear first. - @html = $SearchEngine->HtmlResults($space => $query, ...args...)
- @html = Yahoo::Search->HtmlResults($space => $query, ...args...)
-
Like
Links
, but returns a list of html strings (one representing each result). Seeas_html
in the documentation for Yahoo::Search::Result.A simple result display might look like
print join "<p>", Yahoo::Search->HtmlResults(....);
or, perhaps
if (my @HTML = Yahoo::Search->HtmlResults(....)) { print "<ul>"; for my $html (@HTML) { print "<li>", $html; } print "</ul>"; }
As an example, here's a complete CGI which shows results from an image-search, where the search term is in the '
s
' query string:#!/usr/local/bin/perl -w use CGI; my $cgi = new CGI; print $cgi->header(); use Yahoo::Search AppId => 'my-search-app'; if (my $term = $cgi->param('s')) { print join "<p>", Yahoo::Search->HtmlResults(Image => $term); }
The results, however, do look better with some style-sheet attention, such as:
<style> .yResult { display: block; border: #CCF 3px solid ; padding:10px } .yLink { } .yTitle { display:none } .yImg { border: solid 1px } .yUrl { display:none } .yMeta { font-size: 80% } .ySrcUrl { } .ySum { font-family: arial; font-size: 90% } </style>
Note: all arguments are in key/value pairs, but the
$space
/$query
pair (which is required) is required to appear first. - $num = $SearchEngine->MaxCount($space)
- $num = Yahoo::Search->MaxCount($space)
-
Returns the maximum allowed
Count
query-argument for the given search space. - $SearchEngine->Default($key [ => $val ]);
-
If a new value is given, update the <$SearchEngine>'s value for the named
$key
.In either case, the old value for
$key
in effect is returned. If the$SearchEngine
had a previous value, it is returned. Otherwise, the global value in effect is returned.As always, the key is from among those mentioned in the Query arguments section above.
The old value is returned.
- Yahoo::Search->Default($key [ => $val ]);
-
Update or, if no new value is given, check the global default value for the named argument. The key is from among those mentioned in the Query examples section above, as well as
AutoCarp
(discussed below).
Defaults and Default Overrides
All key/value pairs mentioned in the Query arguments section may appear on the use
line, in the call to the new
constructor, or in requests that create a query explicitly or implicitly (Request
, Query
, Results
, Links
, or HtmlResults
).
Each argument's value takes the first of the following which applies (listed in order of precedence):
- 4)
-
The actual arguments to a function which creates (explicitly or implicitly) a request.
- 3)
-
Search-engine default overrides, set when the Yahoo::Search
new
constructor is used to create a search-engine object, or when that object'sDefault
method is called. - 2)
-
Global default overrides, set on the
use
line or viaYahoo::Search->Default()
- 1)
-
Defaults hard-coded into these packages (e.g.
Count
defaults to 10).
It's particularly convenient to put the AppId
on the use
line, e.g.
use Yahoo::Search AppId => 'just testing';
AutoCarp
By default, detected errors that would be classified as programming errors (e.g. use of incorrect args) are automatically spit out to stderr besides being returned via $@
. This can be turned off via
use Yahoo::Search AutoCarp => 0;
or
Yahoo::Search->Default(AutoCarp => 0);
The default of true is somewhat obnoxious, but hopefully helps create better programs by forcing the programmer to actively think about error checking (if even long enough to turn off error reporting).
Global Variables
The following are globally available:
%Yahoo::Search::KnownCountry
-
A hash with the known (as of this writing) country codes supported by Yahoo! for the
Country
argument. %Yahoo::Search::KnownLanguage
-
A hash with the known (as of this writing) language codes supported by Yahoo! for the
Language
argument. $Yahoo::Search::RecentRequestUrl
-
The most recent REST url actually fetched from Yahoo! (perhaps useful for debugging). It does not reflect the fact that a request is changed to a POST when request is sufficiently large. Thus, there are times when the url on
$Yahoo::Search::RecentRequestUrl
is not actually fetchable from the Yahoo! servers. $Yahoo::Search::UseXmlSimple
-
If you set this to a true value, the XML returned by Yahoo! will be parsed with XML::Simple rather than with the simple XML parser included as part of this package (Yahoo::Search::XML). XML::Simple uses XML::Parser under the hood, and at least on the systems I've tested it, XML::Parser suffers from a crippling memory leak that makes it very undesirable.
However, if Yahoo! changes the XML they return in a way that my simple parser can't handle, you can install XML::Simple and set
$Yahoo::Search::UseXmlSimple
and at least have things work (until you run out of memory).The default value of
$Yahoo::Search::UseXmlSimple
is taken from the environment variableYAHOO_SEARCH_XMLSIMPLE
if present, and otherwise defaults to false $Yahoo::Search::Version
-
A string in "X.Y.Z" format. The first number, the major version, increments with large and/or backwards major incompatible changes. The second number (minor version) updates with notable feature additions/changes. The third number updates with every new release (and is the only one updated for small bug- and typo fix releases).
Environment
If YAHOO_SEARCH_XMLSIMPLE
is set to a true (nonempty, non-"0") value, $Yahoo::Search::UseXmlSimple
defaults to true. See above.
Yahoo::Search uses LWP to communicate with Yahoo!'s servers; LWP uses environment variables such as http_proxy
and no_proxy
. See the perldoc for LWP for more.
Copyright
Copyright (C) 2005 Yahoo! Inc.
Author
Jeffrey Friedl (jfriedl@yahoo.com)
$Id: Search.pm 2 2005-01-28 04:27:46Z jfriedl $