The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

WWW::Search::Scraper::Request - Canonical form for Scraper requests

SYNOPSIS

    use WWW::Search::Scraper::Request;

    $request = new WWW::Search::Scraper::Request( $query );

    $scraper = new WWW::Search::Scraper( $engine );
    $scraper->request($request);
    while ( $result = $scraper->next_response() ) {
        # Consume your $results !
    }

DESCRIPTION

In this version 1.00, the user must set all field values via $rqst->field(name, value). This base class does not have any fields assigned to it, except the implicit "query" field, which you will set at the new() method, or via the query() method.

This is the minimal condition required to pass it to the Scaper module's prepare() method. (Later, we anticipate setting the SQL WHERE-ish query(); then prepare() would translate that via field(), first.)

METHODS

query

Get/Set the query string. You may also set the query string in the new() method.

postSelect

postSelect() is a callback function that may be called by the Scraper module to help it decide if the response it has received will actually qualify against this request. postSelect() should return true if the response matches the request, false if not.

The parameters postSelect() will receive are

$request

A reference to itself, of course.

$scraper

A reference to the Scraper module under which all of this is happening. You probably won't need this, but there it is.

$response

The Scraper::Response object that is the actual response. This is probably (or should be) an extension to a sub-class appropriate to your Scraper::Request sub-class.

$alreadyDone

The Scraper module will tell you which fields, by name, that it has already has (or will) handle on it's own. This parameter may be a string holding a field name, or a reference to an array of field names.

Scraper::Request contains a method for helping you vector on $alreadyDone. The method

    $request->alreadyDone('fieldName', $alreadyDone)

will return true if the field 'fieldName' is in $alreadyDone.

debug

The debug method sets the debug tracing level to the value of its first parameter.

TRANSLATIONS

The Scraper modules that do table driven field translations (from canonical requests to native requests) will have files included in their package representing the translation table in Storable format. The names of these files are <ScraperModuleName>.<requestType>.<canonicalFieldName>. E.G., Brainpower.pm owns a translation table for the 'locations' field of the canonical Request::Job module; it is named Brainpower.Job.locations .

The Scraper module will locate the translation file, when required, by searching the @INC path-search until it is found (the same path-search Perl uses to locate Perl modules.)

set<fieldName>Translation()

The methods set<fieldName>Translations() can be used to help maintain these translation files. For instance, setLocationsTranslation('canonical', 'native') will establish a translation from 'canonical' to 'native' for the 'locations' request field.

    setLocationsTranslation('CA-San Jose', 5);       # CA-San Jose => '5'
    setLocationsTranslation('CA-San Jose', [5,6]);   # CA-San Jose => '5' + '6'
    

If you have used this method to upgrade your translations, then a new upgrade of WWW::Search::Scraper will probably over-write your tranlation file(s), so watch out for that! Back up your translation files before upgrading WWW::Search::Scraper!

AUTHOR

WWW::Search::Scraper::Request is written and maintained by Glenn Wood, <glenwood@alumni.caltech.edu>.

COPYRIGHT

Copyright (c) 2001 Glenn Wood All rights reserved.

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.