NAME

Hadoop::Streaming::Combiner - Simplify writing Hadoop Streaming jobs. Combiner follows the same interface as Reducer. Requires a combine() function which will be called for each line of combiner data. Combiners are run on the same machine as the mapper as a pre-reduce reduction step.

VERSION

version 0.122420

SYNOPSIS

#!/usr/bin/env perl

package WordCount::Combiner;
use Any::Moose;
with qw/Hadoop::Streaming::Combiner/;

sub combine {
    my ($self, $key, $values) = @_;

    my $count = 0;
    while ( $values->has_next ) {
        $count++;
        $values->next;
    }

    $self->emit( $key => $count );
}

package main;
WordCount::Combiner->run;

This combiner is identical to our reduce example. In this case, the reducer could be used as the combiner as it maps (k,v)->(k',v') with the same key and value format.

This method exists as a convenience factor as a wrapper around the Hadoop::Streaming::Reduce pieces.

Your mapper class must implement map($key,$value), your optional combiner must implement combine($key,$value), and your reducer must implement reduce($key,$value). The combiner should generally output ($key,$value_combine) in the same format to allow the user to engage or disengage the optional combiner step as necessary to optimize the job run.

Your classes will have emit() and run() methods added via the role.

METHODS

run

Package->run();

This method starts the Hadoop::Streaming::Combiner instance.

After creating a new object instance, it reads from STDIN and calls $object->combine( ) passing in the key and an iterator of values for that key.

Subclasses need only implement combine() to produce a complete Hadoop Streaming compatible reducer.

AUTHORS

  • andrew grangaard <spazm@cpan.org>

  • Naoya Ito <naoya@hatena.ne.jp>

COPYRIGHT AND LICENSE

This software is copyright (c) 2012 by Naoya Ito <naoya@hatena.ne.jp>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.