SYNOPSIS
The base of your reader.
DESCRIPTION
Reader:
Reader is described as the this that will read (parse) the information (pages).
So the reader parses documents. The reader holds all the logic used to crawl a web page.. it knows all the rules to get the content transformed into objects.
After the reader makes its job, it is suposed to pass the information into the Writer, which will then write to a file or write over a network, etc.
____________ __________ ___________
| Internet | <<=============== | Reader | | Writer |
|__________| ===============>> |________| ===============>> |_________|
reader requests The Writer:
information and - saves
parse.Then send - send email
to the Writer - save stats
ATTRIBUTES
robot
passed_key_values
*** will be renamed to request_storage or something like that.
holds values that are passed between pages navigation.
ie: im collecting data for an object, and, there is some stuff on page#1 and some other stuff on page#2 and #3. Then i can use passed_key_values to pass keys and values to my next page.
headers
holds the current session headers
METHODS
append
shortcut for $self->robot->queue->append
prepend
shortcut for $self->robot->queue->prepend
current_page
shortcut for $self->robot->queue->prepend
tree
shortcut for $self->robot->parser->tree
xml
shortcut for $self->robot->parser->xml