NAME

Hadoop::Streaming::Combiner - Simplify writing Hadoop Streaming jobs. Combiner follows the same interface as Reducer. Requires a combine() function which will be called for each line of combiner data. Combiners are run on the same machine as the mapper as a pre-reduce reduction step.

VERSION

version 0.143060

SYNOPSIS

#!/usr/bin/env perl

package WordCount::Combiner;
use Moo;
with qw/Hadoop::Streaming::Combiner/;

sub combine {
    my ($self, $key, $values) = @_;

    my $count = 0;
    while ( $values->has_next ) {
        $count++;
        $values->next;
    }

    $self->emit( $key => $count );
}

package main;
WordCount::Combiner->run;

This combiner is identical to our reduce example. In this case, the reducer could be used as the combiner as it maps (k,v)->(k',v') with the same key and value format.

This method exists as a convenience factor as a wrapper around the Hadoop::Streaming::Reduce pieces.

Your mapper class must implement map($key,$value), your optional combiner must implement combine($key,$value), and your reducer must implement reduce($key,$value). The combiner should generally output ($key,$value_combine) in the same format to allow the user to engage or disengage the optional combiner step as necessary to optimize the job run.

Your classes will have emit() and run() methods added via the role.

METHODS

run

Package->run();

This method starts the Hadoop::Streaming::Combiner instance.

After creating a new object instance, it reads from STDIN and calls $object->combine( ) passing in the key and an iterator of values for that key.

Subclasses need only implement combine() to produce a complete Hadoop Streaming compatible reducer.

AUTHORS

andrew grangaard <spazm@cpan.org>
Naoya Ito <naoya@hatena.ne.jp>

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install Hadoop::Streaming, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Hadoop::Streaming

CPAN shell

perl -MCPAN -e shell
install Hadoop::Streaming

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)