NAME
JSON::SL::Tuba - High performance SAX-like interface for JSON
SYNOPSIS
Create a very naive JSON encoder using JSON::SL::Tuba
my $JSON ||= <<'EOJ';
{
"a" : "b",
"c" : { "d" : "e" },
"f" : [ "g", "h", "i", "j" ],
"a number" : 0.4444444444,
"a (false) boolean": false,
"another (true) boolean" : true,
"a null value" : null,
"exponential" : 1.3413400E4,
"an\tescaped key" : "a u-\u0065\u0073caped value", "שלום":"להראות"
}
EOJ
# Split the 'stream' into multiple chunks to demonstrate the streaming
# feature:
my @Chunks = unpack("a(8)*", $JSON);
# Make a subclass and set up the methods..
package A::Giant::Tuba;
use base qw(JSON::SL::Tuba);
sub on_any {
my ($tuba,$info,$data) = @_;
#use constant comparisons
if ($info->{Type} == TUBA_TYPE_JSON) {
printf STDERR ("JSON DOCUMENT: %c\n\n", $info->{Mode});
return;
}
# or use the mnemonic ones
if ($info->{Key} && $info->{Mode} =~ m,[>\+],) {
printf ('"%s" : ', $info->{Key});
}
if ($info->{Type} == TUBA_TYPE_STRING) {
printf('"%s",' . "\n", $data || "<NO DATA>");
} elsif ($info->{Type} =~ m,[\[\{],) {
if ($info->{Mode} eq '+') {
print $info->{Type} . "\n";
} else {
print $JSON::SL::Tuba::CloseTokens{$info->{Type}} . ",\n";
}
} else {
if (defined $data) {
print $data . ",\n"
} else {
die ("hrrm.. what have we here?")
unless $info->{Type} == TUBA_TYPE_NULL;
print "null,\n";
}
}
}
my $o = My::Giant::Tuba->new();
$o->parse($_) for @Chunks;
Output:
JSON DOCUMENT: +
{
"a" : b,
"c" : {
"d" : e,
},
"f" : [
g,
h,
i,
j,
],
"a number" : 0.4444444444,
"a (false) boolean" : 0,
"another (true) boolean" : 1,
"a null value" : null
"exponential" : 13413.4,
"an escaped key" : a u-escaped value,
"שלום" : להראות,
},
JSON DOCUMENT: -
DESCRIPTION
JSON::SL::Tuba
provides an event-based and high performance SAX-like interface for parsing streaming JSON.
Emphasis when designing JSON::SL::Tuba
was the reduction of boilerplate (the author does not have favorable experiences with SAX APIs) and high performance.
This uses the same core JSON functionality and speed as JSON::SL.
To use JSON::SL::Tuba
, simply inherit from it and define one or more methods to be called when a parse event occurs.
In normal cases (and this is the default), only a single method (see below) needs to be implemented to be able to receive events.
Of course, if your application requirements are more complex, Tuba is able to deliver you events to the resolution of a single character.
CALLBACK ARGUMENTS AND TERMINOLOGY
These are the list of methods which to implement. All methods follow a single unified calling convention in the form of
callback($tuba, $info, $data);
where $tuba
is the JSON::SL::Tuba
instance, $info
is a hash reference containing metadata about the item for which the event was received, and $data
contains the actual 'data' (if applicable)
Info Hash
This hash contains metadata for determining relevant information about the current item.
The hash and all its contents are read-only. Their contents are not valid after the callback returns (see "CAVEATS"). This is for both performance and sanity reasons.
Its keys and values are as follows
Type
-
This is the type of JSON object for which an event was received.
The following table represents a table of type constants, and their mnemonic symbols. The value to this key itself is a double-typed scalar which yields either the character or the numeric value depending on the context.
Constant Mnemonic Symbol Description === Scalar Types === TUBA_TYPE_STRING " "string" value TUBA_TYPE_KEY # hash key TUBA_TYPE_BOOLEAN ? JSON boolean atom ('true','false') TUBA_TYPE_NUMBER = number TUBA_TYPE_NULL ~ JSON 'null' atom === Container Types === TUBA_TYPE_OBJECT { hash (JSON 'object') TUBA_TYPE_LIST [ array (JSON 'list') === Pseudo Types === TUBA_TYPE_JSON D the entire stream TUBA_TYPE_SPECIAL ^ non-string scalar TUBA_TYPE_DATA c any scalar data
- Mode
-
This is the 'mode' of the callback. The mode is also a magical mnemomic constant similar to the type.
I use the term element to mean any kind of JSON variable/object - i.e. anything listed in the above type table.
Constant Mnemonic Symbol Description TUBA_MODE_START + the start of an element TUBA_MODE_END - completion of an element TUBA_MODE_ON > data (contents) of an element
By default, the behavior is as follows:
Complex type events (new hash, new list) are delivered as
START
events. When they complete,END
events are providedFor Scalar types, the
START
andEND
callbacks are not delivered, but their contents internally accumulated and delivered in whole via a singleON
callback.Almost every aspect of this is entirely configurable, and these are just (what I hope) sane defaults.
- Key
-
By default, keys are not delivered as their own events, but rather attached to this field for the values which succeed them.
This field, if present, will contain the JSON key.
Only valid if the parent object is a hash.
See the
accum_kv
option below for a way to make keys be delivered as their own events. - Index
-
Like key, but instead of a string key, this is a numeric index. Indexes are never delivered as explicit events (since they are inherently implicit entities).
Only valid if parent object is a list.
- Escaped
-
This is a boolean flag. Set to true if the current string needs escaping. This is never set unless string events are delivered incrementally.
Data
Nothing much to say here. This is the pure 'data' associated with the callback.
For ON
-style callbacks, this will contain a complete string/number/key (the default), or fragment thereof.
By default, strings are unescaped and numeric formats converted to their Perl equivalent when they cannot be easily stringified.
Complex (non-scalar) objects will never receive an ON
-style callback.
START
and STOP
callbacks never have any data, either.
CALLBACKS
If you've read the above section, then the names of the callbacks to be delivered are relatively consistent.
- on_any
-
This is the default and catch-all callback for all events. The subsequent callbacks in the list do not offer any more capability than this method, but are merely present for performance and convenience (the dispatching for those methods is done in pure C, rather than several layers of Perl).
Therefore, the semantics and behavior of
on_any
depends on the functionality of the method for whichon_any
has been made a surrogate.Determining this can be quite easy. Simply combine the
Type
andMode
fields to yield the equivalent function name:if ($info->{Type} == TUBA_TYPE_LIST and $info->{Mode} eq '+') { my $callback_name = "start_list"; } # etc.
- start_json
- end_json
-
Delivered on the beginning or end of a stream.
- start_object
- end_object
-
Delivered on the beginning and end of a hash
- start_list
- stop_list
-
Delivered on the beginning or end of an array.
- start_string
- stop_string
-
Delivered when a string has started or stopped. More specifically, this means when the lexer has seen an opening or closing
"
- on_string
-
This is where string-specific data gets delivered. This can be either an entire string, or a fragment thereof. In the case of the former, the string is unescaped.
- on_data
-
This is an optional (and default) generic callback for incremental mode - fragments of numbers, booleans, strings, and keys will be delivered here, with the
START
andSTOP
callbacks signalling their beginning and end. - start_number
- stop_number
- on_number
-
These three methods follow the same semantics as their
*_string
equivalents, except of course, there is no unescaping - start_boolean
- stop_boolean
- on_boolean
-
Same behavior as strings and numbers, except that the object (in the default accumulator mode) is converted to a
JSON::SL::Boolean
- start_null
- stop_null
- on_null
-
Delivered for JSON
null
atoms. In accumulator mode, these get converted intoundef
values.
OPTIONS
Accumulators
By default JSON::SL::Tuba
uses internal accumulators to buffer your data. This makes for high level events being delivered efficiently without having to call into perl with multiple callbacks for very small units of data. This also makes it easier for you the user, as state handling mechanisms do not need to be as complex.
In addition, Tuba has a special kv
(key-value) accumulator which buffers hash keys internally and only ever delivers them as the Key
field within the informational hash passed to callbacks.
Accumulator settings control whether incremental 'data' callbacks will be invoked for a specific scalar type or not.
$tuba->accum(tuba, type => boolean, another_type => boolean, ...)
Set accumulator parameters. Each type
argument is one of the TUBA_TYPE_
constants (or a mnemonic character), and each boolean
argument is whether data for that type should be accumulated.
$tuba->accum_kv(boolean)
Gets or sets the status of the key-value accumulator. Note that enabling the key-value accumulator will also enable the generic key (i.e. #
) but disabling the key-value accumulator will not reverse this effect.
$tuba->accum_all(boolean)
This enables or disables the accumulator settings for all scalar types (but not the key-value accumulator)
Generic Options
$tuba->cb_unified(boolean)
If only a single callback is being used, set this option to have Tuba call the on_any
callback initially instead of using this as a fallback.
This is not enabled by default as it prevents any other methods from being called, but should be turned on if you don't care about that fact.
$tuba->utf8(boolean)
Tell Tuba to set the SvUTF8
flag on strings.
$tuba->allow_unhandled(boolean)
By default, Tuba will croak if it cannot find a handler method for a given event (this effectively means the on_any
method has not been implemented). This is usually what you want. To disable this behavior, set allow_unhandled
to a true value.
Parsing Data
There is one method:
$tuba->parse($json_chunk)
And that's all there is to it. Tuba will parse all data fed to it.
If accumulator mode is not being used, then you will be guaranteed to rhave processed every bit of data in $json_chunk
, leaving nothing buffered.
This method will croak on error (and I have not yet implemented error handling).
Storing Data in Tuba
The tuba object is a simple hash references. Feel free to use it and abuse it. One exception is the _TUBA
key which contains the pointer to the internal C structure. You will probably have perl croak for trying to modify this read-only variable - but if perl doesn't croak, your program will crash - so don't modify it.
BUGS AND CAVEATS
It would be nice to provide an error handler.
Info Hash
The info hash passed to callbacks is read only and volatile. This means the following:
Trying to access a non-existent key in the hash (i.e. any key not listed in the section describing this hash) will throw an error about accessing a disallowed key.
Trying to modify any value in the hash will throw and error.
Keeping references to values within the hash, e.g.
my $ref = \$hash->{Type};
will not work as the value will not be consistent after the callback has returned.
It is safe to take a reference to the Key
field, though.
Speed
Considering what Tuba does and the convenience it provides, it's blazingly fast. Nevertheless, JSON::SL is still at least twice the speed.
SEE ALSO
AUTHOR AND COPYRIGHT
Copyright (C) 2012 M. Nunberg
You may use and distribute this software under the same terms and conditions as Perl itself.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 141:
Non-ASCII character seen before =encoding in '"שלום":"להראות"'. Assuming UTF-8