NAME

Hadoop::IO::TypedBytes

VERSION

version 0.002

SYNOPSIS

query.hql

add jar /usr/lib/hive/lib/hive-contrib.jar;
set hive.script.recordwriter=org.apache.hadoop.hive.contrib.util.typedbytes.TypedBytesRecordWriter;
set hivevar:serde=org.apache.hadoop.hive.contrib.serde2.TypedBytesSerDe;

select transform (foo, bar, baz) row format serde '${serde}' using 'script.pl' as ...

script.pl

use Hadoop::IO::TypedBytes qw/make_reader decode_hive/;

my $reader = make_reader(\*STDIN);
while (defined (my $row = decode_hive($reader))) {
    my ($foo, $bar, $baz) = @$row;
    ...
}

NB: these examples only use TypedBytes to pass data from Hive to the transform script. To use TypedBytes for output as well, please refer to Hive documentation.

DESCRIPTION

This package is useful if you want to pass multiline strings or binary data to a transform script in a Hive query.

Encoding routines "encode" and "encode_hive" take Perl values and return strings with their TypedBytes representations.

Decoding routines "decode" and "decode_hive" take a reader callback (see "READING") instead of binary strings, because TypedBytes streams as implemented by Hadoop and Hive are unframed, in other words it is impossible to say in advance how long the object will be and read it in one go before passing it to the decoder. For this reason decoder consumes the binary stream directly.

NAME

Hadoop::IO::TypedBytes - Hadoop/Hive compatible TypedBytes serializer/deserializer.

FUNCTIONS

Nothing is exported by default.

encode

encode($val) -> $binary_string

Encode a scalar, array or hash reference as TypedBytes binary representation.

If you are interfacing with Hive, use "encode_hive" instead.

Containers are encoded recursively. Blessed references are not supported and will cause function to die.

Scalars are always encoded as TypedBytes strings, array references are encoded as TypedBytes arrays and hash references are encoded as TypedBytes structs. Note that as of time of writing Hive TypedBytes decoder does not support arrays and structs. Hive documentation suggests to use JSON in that case.

decode

decode($reader) -> $obj

Decode a binary TypedBytes stream into corresponding perl object.

See "READING" for description of the $reader parameter.

If you interfacing with Hive, use "decode_hive" instead.

TypedBytes lists and arrays are decoded as array references.

TypedBytes structs are decoded as hash references.

TypedBytes standard empty marker and special marker are docoded as empty lists, but you shouldn't encounter these under normla circumstances.

encode_hive

encode_hive(@row) -> $binary_string

Encode a row of values in a format expected by Hive: concatenation of values encoded terminated by a Hive specific "end-of-row" marker.

See notes for "encode" for description of encoding process.

decode_hive

decode_hive($reader) -> $row

Decode a binary TypedBytes stream into an array repsresenting a single row received from Hive.

See "READING" for description of the $reader parameter.

See notes for "decode" for a general description of decoding process.

make_reader

make_reader($fh) -> $callback

Construct reader callback from a filehandle.

READING

Both Hadoop and Hive streams that use TypedBytes provide no way to know size of the following record without decoding the object. For this reason "decode" and "decode_hive" take a callback instead of a string. Reader callback is passed a single parameter, number of bytes to read and should return a string of exactly that length or undef to indicate end-of-file. It should die and NOT return undef if number of bytes in the stream was less than requested.

Use "make_reader" to make a compatible reader callback from a file-handle.

AUTHORS

Philippe Bruhat
Sabbir Ahmed
Somesh Malviya
Vikentiy Fesunov

COPYRIGHT AND LICENSE

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

To install Hadoop::IO::TypedBytes, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Hadoop::IO::TypedBytes

CPAN shell

perl -MCPAN -e shell
install Hadoop::IO::TypedBytes

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)

NAME

VERSION

SYNOPSIS

query.hql

script.pl

DESCRIPTION

NAME

FUNCTIONS

READING

AUTHORS

COPYRIGHT AND LICENSE

Module Install Instructions