NAME
Data::Pipeline::Machine - easy-to-use machine building
SYNOPSIS
Machine Definition
package Data::Pipeline::AdapterX::GoogleScholar;
use Data::Pipeline::Machine;
use Data::Pipeline qw( FetchPage Regex StringReplace UrlBuilder );
pipeline(
FetchPage(
cut_start => '<p class=g>',
cut_end => '</table>',
split => '<p class=g>',
url => UrlBuilder(
base => 'http://scholar.google.com/scholar',
query => {
q => Option( q => ( default => 'biology' ) ),
hl => 'en',
lr => '',
scoring => 'r',
as_ylo => Option( year => ( default => '2007' ) ),
num => 100,
safe => 'off'
}
),
),
Rename(
copies => {
content => 'description',
content => 'title'
},
renames => {
content => 'link'
}
),
Regex(
rules => [
title => sub { s/^<span class="w">.*?<a.+?>(.+?)/$1/gs },
title => sub { s/(.+?)</a.+/$1/gs },
title => sub { s/…//gs },
link => sub { s{.+?http://(.+?)".+}{http://$1}gs },
title => sub { s/<.+?>//gs },
title => sub { s/ //gs },
description => sub { s{+?<span class="a">.+?- (.+?) -.+}{$1}gs },
description => sub { s{<.+?>}{}gs }
]
)
); # pipeline
Machine use
use Data::Pipeline qw( Pipeline GoogleScholar CSV );
my $pipe = Pipeline(
GoogleScholar,
CSV( column_names => [qw(title link)] )
);
$pipe -> from( q => 'physics' ) -> to( \*STDOUT );
DESCRIPTION
This package makes it easy to construct collections of pipelines that together act as an action or an adapter.
CONSTRUCTORS
Several constructors are exported automatically by the package.
Option( $name => %options )
This constructs an object that will supply an optional argument for the transformation. A default value can be supplied in the options.
The value is pulled from the argument $name given when calling from
on the machine. In the example in the synopsis, the Option( q => ... ) in the machine definition pulls its value from the q
value supplied when the machine is used in a pipeline and the pipeline is instantiated. Likewise, the Option( year => ... ) supplies its default value because no year is given.
pipeline( [ $name => ] pipeline definition )
This defines a pipeline with an optional name.
If the name is not given, it is assumed to be 'finally'. Only one pipeline should be defined without an explicit name. The pipeline named 'finally' is the default pipeline to start with when constructing an unnamed pipeline using from
or transform
.
Pipeline
Instead of defining a pipeline as the similar method would do if imported from Data::Pipeline, this allows you to call another pipeline in the machine with arguments.
AUTHOR
James Smith
LICENSE
Copyright (c) 2008 Texas A&M University.
This library is free software, you can redistribute it and/or modify it under the same terms as Perl itself.