NAME

Data::Pipeline::Machine - easy-to-use machine building

SYNOPSIS

Machine Definition

package Data::Pipeline::AdapterX::GoogleScholar;

use Data::Pipeline::Machine;

use Data::Pipeline qw( FetchPage Regex StringReplace UrlBuilder );

pipeline(
    FetchPage(
        cut_start => '<p class=g>',
        cut_end => '</table>',
        split => '<p class=g>',
        url => UrlBuilder(
            base => 'http://scholar.google.com/scholar',
            query => {
                q => Option( q => ( default => 'biology' ) ),
                hl => 'en',
                lr => '',
                scoring => 'r',
                as_ylo => Option( year => ( default => '2007' ) ),
                num => 100,
                safe => 'off'
            }
        ),
    ),
    Rename(
        copies => {
            content => 'description',
            content => 'title'
        },
        renames => {
            content => 'link'
        }
    ),
    Regex(
        rules => [
            title => sub { s/^<span class="w">.*?<a.+?>(.+?)/$1/gs },
            title => sub { s/(.+?)</a.+/$1/gs },
            title => sub { s/&hellip;//gs },
            link => sub { s{.+?http://(.+?)".+}{http://$1}gs },
            title => sub { s/<.+?>//gs },
            title => sub { s/&nbsp;//gs },
            description => sub { s{+?<span class="a">.+?- (.+?) -.+}{$1}gs },
            description => sub { s{<.+?>}{}gs }
        ]
    )
); # pipeline

Machine use

use Data::Pipeline qw( Pipeline GoogleScholar CSV );

my $pipe = Pipeline(
    GoogleScholar,
    CSV( column_names => [qw(title link)] )
); 

$pipe -> from( q => 'physics' ) -> to( \*STDOUT );

DESCRIPTION

This package makes it easy to construct collections of pipelines that together act as an action or an adapter.

CONSTRUCTORS

Several constructors are exported automatically by the package.

Option( $name => %options )

This constructs an object that will supply an optional argument for the transformation. A default value can be supplied in the options.

The value is pulled from the argument $name given when calling from on the machine. In the example in the synopsis, the Option( q => ... ) in the machine definition pulls its value from the q value supplied when the machine is used in a pipeline and the pipeline is instantiated. Likewise, the Option( year => ... ) supplies its default value because no year is given.

pipeline( [ $name => ] pipeline definition )

This defines a pipeline with an optional name.

If the name is not given, it is assumed to be 'finally'. Only one pipeline should be defined without an explicit name. The pipeline named 'finally' is the default pipeline to start with when constructing an unnamed pipeline using from or transform.

Pipeline

Instead of defining a pipeline as the similar method would do if imported from Data::Pipeline, this allows you to call another pipeline in the machine with arguments.

AUTHOR

James Smith

LICENSE

Copyright (c) 2008 Texas A&M University.

This library is free software, you can redistribute it and/or modify it under the same terms as Perl itself.