SYNOPSIS

use WebService::LOC::CongRec::Crawler;
use Log::Log4perl;
Log::Log4perl->init_once('log4perl.conf');
$crawler = WebService::LOC::CongRec::Crawler->new();
$crawler->congress(107);
$crawler->oldest(1);
$crawler->goForth();

ATTRIBUTES

congress

The numbered congress to be fetched. If this is not given, the current congress is fetched.

issuesRoot

The root page for Daily Digest issues.

Breadcrumb path: Library of Congress > THOMAS Home > Congressional Record > Browse Daily Issues

issues

A hash of issues: %issues{year}{month}{day}{section}

mech

A WWW::Mechanize object with state that we can use to grab the page from Thomas.

oldest

Boolean attribute specifying that pages are visited from earliest to most recent.

The default is 0 - that is visit most recent first.

METHODS

goForth()

$crawler->goForth();
$crawler->goForth(process => \&process_page);
$crawler->goForth(start => $x);
$crawler->goForth(end => $y);

Start crawling from the Daily Digest issues page, i.e. http://thomas.loc.gov/home/Browse.php?&n=Issues

Also, for a specific congress, where NUM is congress number: http://thomas.loc.gov/home/Browse.php?&n=Issues&c=NUM

Returns the total number of pages grabbed.

Accepts an optional processing function to perform for each page.

Accpets optional page counter start and end ranges. If neither are given, or given as zero, crawing starts from the beginning and goes until all pages are visited.

parseRoot(Str $content)

Parse the the root of an issue an fill our hash of available issues