NAME
Data::FastPack - FastPack Record format, Parsing and Serialising
DESCRIPTION
Implements an incremental parser to parse an incoming buffer into messages. Provides basic encoding and decoding functions to serialize messages.
FASTPACK FORMAT SUMMARY
FastPack format is a binary format for storing records of opaque data related to a time index into a stream, padded to a multiples of 8 bytes. Within each stream of data, an ID refers to a channel of data. All multi byte fields are in little endian byte order processing unless otherwise indicated by the description/metadata specified at higher stack levels. The fields of a message are:
time(double float) # 8 byte aligned
id(32) # uint32
len(32) # uint32
payload bytes # 8 byte aligned
padding (as required)
time
is the absolute time of the sample or the difference in time from the current message to the previous message in the same stream. The exact meaning of the time is as per the definition messages. It is a double float to allow web browsers to utilise high resolution time, as they do not support 64bit integers.
id
is is the channel id within the file/stream. It relates to a definition file. 0 indicates a meta data point which is JSON or other structured data, which alters the processing of the file.
Optionally a namespace can be used to provide dynamic ID generation. The ids are mapped from a name/label/topic/ to an integer. Additional mapping messages are inserted into the stream to facilitate this mapping at both ends.
len
is the length of the payload. If the length is larger than 2^32 then it must be fragmented at the 'application level'.
payload
is the data.
padding
Every record is padded to an 8 byte bounadary, with nulls, if nessicary
CONCEPTS
The message format is primarily intended to store a sequence of time indexed values which are to be parameterized to another channel. For example indexing a sensor to position where both the sensor and the position are sampled separately, but can be stored with the same time base.
There is only one reserved channel id, 0 , which is a meta data channel. This channel is JSON or MessagePack, and provides the meta/header data to control the stream from that point forward.
The meta data semantics are application dependant, giving great flexibility, in time base, channel relationships etc
Efficient use of memory access for ARM cpus
Multi byte data types are stored in little endian order, unless otherwise specified in the meta data. Payloads are also on a 8 byte boundary allows direct access for double precision float
Messages stored in one or more files
Multiple streams of message can be stored in a single file if they share a time base. For streams that have differing time bases, they are stored in a separate files. This give good compression ability
External defintition file(s) for message types if required.
The definitions of a file can be pointed to externally, or can be stored internally in a meta data message
Highly compresssable and suitable for self contained web applications
The message time, id, length fields will be mostly unchanging when multiple message source of same time base are recorded together. The 24 byte header will basically be reduced to 1 or 2 bytes after compression for most messages.
Timimg
Timing data is a double float field and can represent many different timing scenarios.
Direct time (seconds)
The simplest case is the storing seconds as floating point values in the field. Whether the value is a difference to the previous message or an absolute value is based on definition messages for the file.
Multiple of a time base
Similar to direct time above, however the value is multiplied by an external time base factor to generate the actual time.
Argument/index into a timing function.
The value is used as an index into a timing function stored in external JavaScript, which when called calculates a time. i.e. for processing video with fixed non integer frame rates
For a system reporting only a single message, the time field will constantly be updated for each message. However with multiple channels only the first message from the group will have a non zero time when using difference mode. Most messages in a system will have a 0 time because of this.
Padding
Padding to an 8 byte boundary is implicit to every message. Arbitrary bytes can be appended to ensure the alignment.
Payload Length
A 32 bit field indicating the length of the data in the payload. It is stored just before the payload to allow more efficient decoding in scripting languages
Payload
The payload of the message. It is 8 byte aligned for better memory access (ie a double can extracted directly out of the payload field)
META AND STRUCTURED DATA
All message ids of 0 are designated "Meta Data". This means the payload is encoded either in a JSON array or object or as a MessagePack structure
This gives the fast encoding/decoding of simple time series values and the ability to have arbitrarily complex data when required
Decoding of meta data automatically picks MessagePack or JSON as required, as long as the encoded values are are map or array types.
It is recommended that general structured data in other message ids also either JSON or MessagePack also.
NAMED IDs
Static IDs are simple for small closed systems. However for larger system, using dynamic ids allows the scoping. The mapping between names and ids is automatically applied when using a name space. Encoding a named id for the first time will in insert a entry in the local encoding side table. It will also inject a message with the dynamic id and payload of the name into the stream. This will be the first time this id is seen, and is a marker to update the decoding end of the system.
If a named message with no payload is encoded, this serves as a de registration of the id and adds it to a pool of reusable ids.
For a bidirectional link, 4 name spaces are actually use. The encoding end of each direction is the master in that direction. The ids likely don't match, but they don't need to. The decoding end of each direction is shadow/slave copy of the encoding table. Over a reliable transport, these tables will always be in sync with the master from the encoding end.
API Usage
Encoding and Decoding
encode_fastpack
encode_fastpack $buf, $inputs, $limit, $namespace
Encodes and array ref of an array refs [time, ID, payload]
in to the buffer $buf
. $buf
is aliased internally, and not copied. All need encoded messages are appended to the buffer.
If limit
is supplied, and less then the length of the $input
array, only this number of inputs will be consumed. Inputs consumed are spliced out of the input array to allow the same array to be appended to externally
If $namespace
is provided, all messages are encoded assuming the id is name and dynamically allocates an ID to the name space. Note $limit
must be provided (even if undefended) to use this argument.
Namespace if prevented from using 0
, "0"
, or undef
as names. Messages with these names issue a warning.
decode_fastpack
decode_fastpack $buf,$output, $limit, $namespace
Decodes FastPack messages from $buf
, consuming it as it progresses. $buf
is aliases so new messages can be added to it externally.
$output
is a reference to an array to store the decoded messages. The messages are decoded into [time, id, payload]
.
If $limit
is provided, this is the maximum number of messages to decode during a single call. If not provided or undef, 4096 is the default.
$namespace
if provided enables named ids. decoded messages have the id mapped to a name stored in the names space. Special namespace update messages manipulate the namespace table to ensure it is copy of the encoding end of the stream
Namspaces
create_namespace
create_namespace
Returns a name space structure for named ids. Separate name spaces structure are needed for encoding and decoding ends, even withing the same program.
id_for_name
id_for_name $namespace, $name
Returns the integer id in $namespace
for $name
. Useful for testing and optimisation when multiple of the same messages are being encoded.
name_for_id
name_for_id $namespace, $id
Returns the name id in $namespace
for $id
. Useful for testing and optimisation when multiple of the same messages are being encoded.
AUTHOR
Ruben Westerberg, <drclaw@mac.com>
REPOSITORTY and BUGS
Please report any bugs via git hub: http://github.com/drclaw1394/perl-data-fastpack
COPYRIGHT AND LICENSE
Copyright (C) 2025 by Ruben Westerberg
Licensed under MIT
DISCLAIMER OF WARRANTIES
THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE.