NAME

Sereal::Performance - Getting the most out of the Perl-Sereal implementation

SYNOPSIS

# This is different from the standard module synopsis in
# that it chooses performance over ease-of-use.
# Think twice before micro-optimizing your Sereal usage.
# Usually, Sereal is a lot faster than most of one's code,
# so unless you are doing bulk encoding/decoding, you are
# better off optimizing for maintainability.

use Sereal qw(sereal_encode_with_object
              sereal_decode_with_object);
my $enc = Sereal::Encoder->new();
my $dec = Sereal::Decoder->new();

my $big_data_structure = {...};

my $srldoc = sereal_encode_with_object($enc, $big_data_structure);

my $and_back = sereal_decode_with_object($dec, $srldoc);

DESCRIPTION

Using Sereal in the way that is optimally performant for your use case can make quite a significant difference in performance. Broadly speaking, there are two classes of tweaks you can do: choosing the right options during encoding (sometimes incurring trade-offs in output size) and calling the Sereal encode/decode functions in the most efficient way.

If you are not yet using re-usable Sereal::Encoder and Sereal::Decoder objects, then read no further. By switching from the encode_sereal and decode_sereal functions to either the OO interface or the advanced functional interface, you will get a noticeable speed boost as encoder and decoder structures can be reused. This is particularly significant for the encoder, which can re-use its output buffer. In some cases, such a warmed-up encoder can avoid most memory allocations.

I repeat, if you care about performance, then do not use the encode_sereal and decode_sereal interface.

The exact performance in time and space depends heavily on the data structure to be (de-)serialized. Often there is a trade-off between space and time. If in doubt, do your own testing and most importantly ALWAYS TEST WITH REAL DATA. If you care purely about speed at the expense of output size, you can use the no_shared_hashkeys option for a small speed-up, see below. If you need smaller output at the cost of higher CPU load and more memory used during encoding/decoding, try the dedupe_strings option and enable Snappy compression.

For ready-made comparison scripts, see the author_tools/bench.pl and author_tools/dbench.pl programs that are part of this distribution. Suffice to say that this library is easily competitive in both time and space efficiency with the best alternatives.

If switching to the OO interface is not enough, you may consider switching to the advanced functional interface that avoids method lookup overhead, and by inlining as custom Perl OPs, may also avoid some of the Perl function call overhead (Perl 5.14 and up). This additional speed-up is only a constant-offset, avoiding said method/function call, rather than speeding up encoding itself and so will be most significant if you are working with very small data sets.

sereal_encode_with_object and sereal_decode_with_object are optionally exported from the Sereal module (or Sereal::Encoder and Sereal::Decoder respectively). They work the same as the object-oriented interface except that they are invoked differently:

$srl_doc = $encoder->encode($data);

becomes

$srl_doc = sereal_encode_with_object($encoder, $data);

and

$data = $decoder->decode($srl_doc);

becomes

$data = sereal_decode_with_object($decoder, $srl_doc);

On Perl versions before 5.14, this will be marginally faster than the OO interface as it avoids method lookup. This should rarely matter. On Perl versions starting from 5.14, the function call to sereal_encode_with_object or sereal_decode_with_object will also be replaced with a custom Perl OP, thus avoiding most of the function call overhead as well.

Tuning the Sereal::Encoder

Several of the Sereal::Encoder options add or remove useful behaviour and some of them come at a runtime performance cost.

no_shared_hashkeys

By default, Sereal will emit a "repetition" marker for hash keys that were already previously encountered. Depending on your data structure, this can save quite a bit of space in the generated document. Consider, for example, encoding an array of many objects of the same class. But it may not save anything if you don't have a lot of repeated hash keys or don't even encode any hashes to begin with.

In those cases, you can turn this feature off with the no_shared_hashkeys option for a small but measurable speed-up.

dedupe_strings

If set, this option will apply the de-duplication logic to all strings that is only applied to hash keys by default. This can be quite expensive in both memory and performance. The same is true for aliased_dedupe_strings.

snappy and snappy_incr

Enabling Snappy compression can (but doesn't have to) make your Sereal documents significantly smaller. How effective this compression is for you depends entirely on the nature of your data. Snappy compression is designed to be very fast. The additional space savings are very often worth the small overhead.

freeze_callbacks

Using custom Perl FREEZE callbacks is very expensive. If enabled, the encoder has to do a method lookup at least once per class of an object being serialized. If a FREEZE hook actually exists, calling it will be even more expensive. If you care about ultimate performance, use with care.

sort_keys

This option forces the encoder to always sort the entries in a hash by its keys before writing them to the Sereal document. This can be somewhat expensive for large hashes.

General Considerations

Perl variables (scalars specifically) can, at the same time, hold multiple representations of the same data. If you create and integer and use it as a string, it will be cached in its string form. Sereal attempts to detect the most compact of these representations for encoding, but can not always succeed. For example, if a data structure was previously also traversed by certain other serialization modules (such as Storable), then the scalars in the structure may have been irrevocably upgraded to a more complex (and bigger) type. This is only an issue in crude benchmarks. So if you plan to benchmark serialization, take care not to re-use the test data structure between serializers for results that do not depend on the order of operations.

BUGS, CONTACT AND SUPPORT

For reporting bugs, please use the github bug tracker at http://github.com/Sereal/Sereal/issues.

For support and discussion of Sereal, there are two Google Groups:

Announcements around Sereal (extremely low volume): https://groups.google.com/forum/?fromgroups#!forum/sereal-announce

Sereal development list: https://groups.google.com/forum/?fromgroups#!forum/sereal-dev

AUTHORS AND CONTRIBUTORS

Yves Orton <demerphq@gmail.com>

Damian Gryski

Steffen Mueller <smueller@cpan.org>

Rafaël Garcia-Suarez

Ævar Arnfjörð Bjarmason <avar@cpan.org>

Tim Bunce

Daniel Dragan <bulkdd@cpan.org> (Windows support and bugfixes)

Zefram

Some inspiration and code was taken from Marc Lehmann's excellent JSON::XS module due to obvious overlap in problem domain.

ACKNOWLEDGMENT

This module was originally developed for Booking.com. With approval from Booking.com, this module was generalized and published on CPAN, for which the authors would like to express their gratitude.

COPYRIGHT AND LICENSE

Copyright (C) 2012, 2013, 2014 by Steffen Mueller Copyright (C) 2012, 2013, 2014 by Yves Orton

The license for the code in this distribution is the following, with the exceptions listed below:

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

Except portions taken from Marc Lehmann's code for the JSON::XS module, which is licensed under the same terms as this module. (Many thanks to Marc for inspiration, and code.)

Also except the code for Snappy compression library, whose license is reproduced below and which, to the best of our knowledge, is compatible with this module's license. The license for the enclosed Snappy code is:

Copyright 2011, Google Inc.
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:

  * Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
  * Redistributions in binary form must reproduce the above
copyright notice, this list of conditions and the following disclaimer
in the documentation and/or other materials provided with the
distribution.
  * Neither the name of Google Inc. nor the names of its
contributors may be used to endorse or promote products derived from
this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.