NAME

JSON::Streaming::Reader - Read JSON strings in a streaming manner

DESCRIPTION

This module is effectively a tokenizer for JSON strings. With it you can process JSON strings in customizable ways without first creating a Perl data structure from the data. For some applications, such as those where the expected data structure is known ahead of time, this may be a more efficient way to process incoming data.

SYNOPSIS

my $jsonr = JSON::Streaming::Reader->for_stream($fh);
$jsonr->process_tokens(
    start_object => sub {
        ...
    },
    end_object => sub {

    },
    start_property => sub {
        my ($name) = @_;
    },
    # ...
);

CREATING A NEW INSTANCE

This module can operate on either an IO::Handle instance or a string.

JSON::Streaming::Reader->for_stream($fh)

Create a new instance that will read from the provided IO::Handle instance. If you want to operate on a raw Perl filehandle, you currently must wrap it up in an IO::Handle instance yourself.

JSON::Streaming::Reader->for_string(\$string)

Create a new instance that will read from the provided string. Uses IO::Scalar to make a stream-like wrapper around the string, and passes it into for_stream.

CALLBACK API

The recommended way to use this library is via the callback-based API. In this API you make a single method call on the reader object and pass it a CODE ref for each token type. The reader object will then consume the entire stream and call the callback responding to the type of each token it encounters.

An error token will be raised if an error is encountered during parsing.

For tokens that themselves have data, the data items will be passed in as arguments to the callback.

The handlers for the start_property, start_array and start_object tokens may use the skip method from the pull API, as described below, to avoid processing the remainder of the corresponding container.

$jsonr->process_tokens(%callbacks)

Read the whole stream and call a callback corresponding to each token encountered.

PULL API

A lower-level API is provided that allows the caller to pull single tokens from the stream as necessary. The callback API is implemented in terms of the pull API.

$jsonr->get_token()

Get the next token from the stream and advance. If the end of the stream is reached, this will return undef. Otherwise it returns an ARRAY ref whose first member is the token type and its subsequent members are the token type's data items, if any.

$jsonr->skip()

Quickly skip to the end of the current container. This can be used after a start_property, start_array or start_object token is retrieved to signal that the remainder of the container is not actually required. The next call to get_token will return the token that comes after the corresponding end_ token for the current container. The corresponding end_ token is never returned.

This is most useful for skipping over unrecognised properties when populating a known data structure.

It is better to use this method than to implement skipping in the caller because skipping is done using a lightweight mechanism that does not need to allocate additional memory for tokens encountered during skipping. However, since this method uses a simpler state model it may cause less-intuitive error messages to be raised if there is a JSON syntax error within the content that is skipped.

Note that errors encountered during skip are actually raised via die rather than via the return value as with get_token.

TOKEN TYPES

There are two major classes of token types. Bracketing tokens enclose other tokens and come in pairs, named with start_ and end_ prefixes. Leaf tokens stand alone and have add_ prefixes.

For convenience the token type names match the method names used in the "raw" API of JSON::Streaming::Writer, so it is straightforward to implement a streaming JSON normalizer by feeding the output from this module into the corresponding methods on that module. However, this module does have an additional special token type 'error' which is used to indicate tokenizing errors and does not have a corresponding method on the writer.

start_object, end_object

These token types delimit a JSON object. In a valid JSON stream an object will contain only properties as direct children, which will result in start_property and end_property tokens.

start_array, end_array

These token types delimit a JSON array. In a valid JSON stream an object will contain only values as direct children, which will result in one of the value token types described below.

start_property($name), end_property

These token types delimit a JSON property. The name of the property is given as an argument. In a valid JSON stream a start_property token will always be followed by one of the value token types which will itself be immediately followed by an end_property token.

add_string($value)

Represents a JSON string. The value of the string is passed as an argument.

add_number($value)

Represents a JSON number. The value of the number is passed as an argument.

add_boolean($value)

Represents a JSON boolean. If it's true then 1 is passed as an argument, or if false 0 is passed.

add_null

Represents a JSON null.

error($string)

Indicates a tokenization error. A human-readable description of the error is included in $string.

STREAM BUFFERING

This module doesn't do any buffering. It expects the underlying stream to do appropriate read buffering if necessary.

LIMITATIONS

No Non-blocking API

Currently there is no way to make this module do non-blocking reads. In future an event-based version of the callback-based API could be added that can be used in applications that must not block while the whole object is processed, such as those using POE or Danga::Socket. This would require some considerable refactoring, however.

This module expects to be able to do blocking reads on the provided stream. It will not behave well if a read fails with EWOULDBLOCK, so passing non-blocking IO::Socket objects is not recommended.