NAME

FU::Util - Miscellaneous utility functions that really should have been part of a core Perl installation but aren't for some reason because the Perl community doesn't believe in the concept of a "batteries included" standard library. </rant>

EXPERIMENTAL

This module is still in development and there will likely be a few breaking API changes, see the main FU module for details.

SYNOPSIS

use FU::Util qw/json_format/;

my $data = json_format [1, 2, 3];

DESCRIPTION

Boolean Stuff

Perl has had a builtin boolean type since version 5.36 and FU uses that where appropriate, but there's still a lot of older code out there using different conventions. The following function should help when interacting with older code and provide a gradual migration path to the new builtin booleans.

to_bool($val)

Returns undef if $val is not likely to be a distinct boolean type, otherwise it returns a normalized builtin::true or builtin::false.

This function recognizes the builtin booleans, \0, \1, boolean, Types::Serialiser (which is used by JSON::XS, JSON::SIMD, CBOR::XS and others), JSON::PP (also used by Cpanel::JSON::XS and others), JSON::Tiny and Mojo::JSON.

This function is ambiguous in contexts where a bare scalar reference is a valid value for $val, due to \0 and \1 being considered booleans.

JSON parsing & formatting

This module comes with a custom C-based JSON parser and formatter. These functions conform strictly to RFC-8259, non-standard extensions are not supported and never will be. It also happens to be pretty fast, refer to FU::Benchmarks for some numbers.

JSON booleans are parsed into builtin::true and builtin::false. In the other direction, the to_bool() function above is used to recognize which values to represent as JSON boolean.

JSON numbers that are too large fit into a Perl integer are parsed into a floating point value instead. This obviously loses precision, but is consistent with JSON.parse() in JavaScript land - except Perl does support the full range of a 64bit integer. JSON numbers with a fraction or exponent are also converted into floating point, which may lose precision as well. Math::BigInt and Math::BigFloat are not currently supported. Attempting to format a floating point NaN or Inf results in an error.

json_parse($string, %options)

Parse a JSON string and return a Perl value. With the default options, this function is roughly similar to:

JSON::PP->new->allow_nonref->core_bools-decode($string);

Croaks on invalid JSON, but the error messages are not super useful. This function also throws an error on JSON objects with duplicate keys, which is consistent with the default behavior of Cpanel::JSON::XS but inconsistent with other modules.

Supported %options:

utf8

Boolean, interpret the input $string as a UTF-8 encoded byte string instead of a Perl Unicode string.

max_depth

Maximum permitted nesting depth of arrays and objects. Defaults to 512.

max_size

Throw an error if the JSON data is larger than the given size in bytes. Defaults to 1 GiB.

offset

Takes a reference to a scalar that indicates from which byte offset in $string to start parsing. On success, the offset is updated to point to the next non-whitespace character or undef if the string has been fully consumed.

This option can be used to parse a stream of JSON values:

my $data = '{"obj":1}{"obj":2}';
my $offset = 0;
my $obj1 = json_parse($data, offset => \$offset);
# $obj1 = {obj=>1};  $offset = 9;
my $obj2 = json_parse($data, offset => \$offset);
# $obj2 = {obj=>2};  $offset = undef;
json_format($scalar, %options)

Format a Perl value as JSON. With the default options, this function behaves roughly similar to:

JSON::PP->new->allow_nonref->core_bools->convert_blessed->encode($scalar);

Some modules escape the slash character in encoded strings to prevent a potential XSS vulnerability when embedding JSON inside <script> .. </script> tags. This function does not do that because it might not even be sufficient. The following is probably an improvement:

json_format($data) =~ s{</}{<\\/}rg =~ s/<!--/<\\u0021--/rg;

This function generates invalid JSON if you pass it a string with invalid Unicode characters; I don't see how you'd ever accidentally end up with such a string, anyway.

The following %options are supported:

canonical

Boolean, write hash keys in deterministic (sorted) order. This option currently has no effect on tied hashes.

pretty

Boolean, format JSON with newlines and indentation for easier reading. Beauty is in the eye of the beholder, this option currently follows the convention used by JSON::XS and others: 3 space indent and one space around the : separating object keys and values. The exact format might change in later versions.

utf8

Boolean, returns a UTF-8 encoded byte string instead of a Perl Unicode string.

max_size

Maximum permitted size, in bytes, of the generated JSON string. Defaults to 1 GiB.

max_depth

Maximum permitted nesting depth of Perl values. Defaults to 512.

(Why the hell yet another JSON codec when CPAN is already full of them!? Well, JSON::XS is pretty cool but isn't going to be updated to support Perl's new builtin booleans. JSON::PP is slow and while Cpanel::JSON::XS is perfectly adequate, its codebase is way too large and messy for what I need - it has too many unnecessary features and #ifdefs to support ancient perls and esoteric configurations. Still, if you need anything not provided by these functions, JSON::PP and Cpanel::JSON::XS are perfectly fine alternatives. JSON::SIMD and JSON::Tiny also look like good and maintained candidates.)

While URIs are capable of encoding arbitrary binary data, the functions below assume you're only dealing with text. This makes them more robust against weird inputs, at the cost of flexibility.

utf8_decode($bytes)

Convert a (perl-UTF-8 encoded) byte string into a sanitized perl Unicode string. The conversion is performed in-place, so the $bytes argument is turned into a Unicode string. Returns the same string for convenience.

This function throws an error if the input is not valid UTF-8 or if it contains ASCII control characters - that is, any character between 0x00 and 0x1f except for tab, newline and carriage return.

(This is a tiny wrapper around utf8::decode() with some extra checks)

uri_escape($string)

Takes an Unicode string and returns a percent-encoded ASCII string, suitable for use in a query parameter.

uri_unescape($string)

Takes an Unicode string potentially containing percent-encoding and returns a decoded Unicode string. Also checks for ASCII control characters as per utf8_decode().

query_decode($string)

Decode a query string or application/x-www-form-urlencoded format (they're the same thing). Returns a hashref with decoded key/value pairs. Values for duplicated keys are collected into a single array value. Bare keys that do not have a value are decoded as builtin::true. Example:

my $hash = query_decode 'bare&a=1&a=2&something=else';
# $hash = {
#   bare => builtin::true,
#   a => [ 1, 2 ],
#   something => 'else'
# }

The input $string is assumed to be a perl Unicode string. An error is thrown if the resulting data decodes into invalid UTF-8 or contains control characters, as per utf8_decode.

query_encode($hashref)

The opposite of query_decode. Takes a hashref of similar structure and returns an ASCII-encoded query string. Keys with undef or to_bool() false values are omitted in the output.

If a given value is a blessed object with a TO_QUERY() method, that method is called and it should return either undef, a boolean or a string, which is then encoded.

HTTP Date Formatting

The HTTP date format is utter garbage, but with the right tools it doesn't require too much code to work with.

httpdate_format($time)

Convert the given seconds-since-Unix-epoch $time into a HTTP date string.

httpdate_parse($str)

Converts the given HTTP date string into a seconds-since-Unix-epoch integer. This function is very strict about its input and only accepts "IMF-fixdate" as per RFC7231, which is what every sensible implementation written in the past decade uses.

This function plays fast and loose with timezone conversions, the parsed timestamp might be off by an hour or so for a few hours around a DST change. This will not happen if your local timezone is UTC.

File Descriptor Passing

UNIX sockets (see IO::Socket::UNIX) have the fancy property of letting you send file descriptors over them, allowing you to pass, for example, a socket from one process to another. This is a pretty low-level operation and not something you'll often need, but two functions to use that feature are provided here anyway because the FU supervisor uses them:

fdpass_send($send_fd, $pass_fd, $message)

Send a message and a file descriptor ($pass_fd) over the given socket ($send_fd). $message must not be empty, even if you don't intend to do anything with it on receipt. Both $send_fd and $pass_fd must be numeric file descriptors, as obtained by fileno().

($fd, $message) = fdpass_recv($recv_fd, $max_message_len)

Read a file descriptor and message from the given $recv_fd, which must be the numeric file descriptor of a socket. This function can be used as a replacement for sysread(): the returned $fd is undef if no file descriptor was received. The returned $message is undef on error or an empty string on EOF.

Like regular socket I/O, a single fdpass_send() message may be split across multiple fdpass_recv() calls; in that case the $fd is only received on the first call.

Don't use this function if the sender may include multiple file descriptors in a single message, weird things can happen. File descriptors received this way do not have the CLOEXEC flag and will thus survive a call to exec(). Refer to this wonderful discussion for more weirdness and edge cases.

See also IO::FDPass for a more portable solution, although that one does not support passing along regular data.

COPYRIGHT

MIT.

AUTHOR

Yorhel <projects@yorhel.nl>