NAME
FU::Util - Miscellaneous utility functions that really should have been part of a core Perl installation but aren't for some reason because the Perl community doesn't believe in the concept of a "batteries included" standard library. </rant>
EXPERIMENTAL
This module is still in development and there will likely be a few breaking API changes, see the main FU module for details.
SYNOPSIS
use FU::Util qw/json_format/;
my $data = json_format [1, 2, 3];
DESCRIPTION
Boolean Stuff
Perl has had a builtin boolean type since version 5.36 and FU uses that where appropriate, but there's still a lot of older code out there using different conventions. The following function should help when interacting with older code and provide a gradual migration path to the new builtin booleans.
- to_bool($val)
-
Returns
undef
if$val
is not likely to be a distinct boolean type, otherwise it returns a normalizedbuiltin::true
orbuiltin::false
.This function recognizes the builtin booleans,
\0
,\1
, boolean, Types::Serialiser (which is used by JSON::XS, JSON::SIMD, CBOR::XS and others), JSON::PP (also used by Cpanel::JSON::XS and others), JSON::Tiny and Mojo::JSON.This function is ambiguous in contexts where a bare scalar reference is a valid value for
$val
, due to\0
and\1
being considered booleans.
JSON parsing & formatting
This module comes with a custom C-based JSON parser and formatter. These functions conform strictly to RFC-8259, non-standard extensions are not supported and never will be. It also happens to be pretty fast, refer to FU::Benchmarks for some numbers.
JSON booleans are parsed into builtin::true
and builtin::false
. In the other direction, the to_bool()
function above is used to recognize which values to represent as JSON boolean.
JSON numbers that are too large fit into a Perl integer are parsed into a floating point value instead. This obviously loses precision, but is consistent with JSON.parse()
in JavaScript land - except Perl does support the full range of a 64bit integer. JSON numbers with a fraction or exponent are also converted into floating point, which may lose precision as well. Math::BigInt and Math::BigFloat are not currently supported. Attempting to format a floating point NaN
or Inf
results in an error.
- json_parse($string, %options)
-
Parse a JSON string and return a Perl value. With the default options, this function is roughly similar to:
JSON::PP->new->allow_nonref->core_bools-decode($string);
Croaks on invalid JSON, but the error messages are not super useful. This function also throws an error on JSON objects with duplicate keys, which is consistent with the default behavior of Cpanel::JSON::XS but inconsistent with other modules.
Supported
%options
:- utf8
-
Boolean, interpret the input
$string
as a UTF-8 encoded byte string instead of a Perl Unicode string. - max_depth
-
Maximum permitted nesting depth of arrays and objects. Defaults to 512.
- max_size
-
Throw an error if the JSON data is larger than the given size in bytes. Defaults to 1 GiB.
- offset
-
Takes a reference to a scalar that indicates from which byte offset in
$string
to start parsing. On success, the offset is updated to point to the next non-whitespace character orundef
if the string has been fully consumed.This option can be used to parse a stream of JSON values:
my $data = '{"obj":1}{"obj":2}'; my $offset = 0; my $obj1 = json_parse($data, offset => \$offset); # $obj1 = {obj=>1}; $offset = 9; my $obj2 = json_parse($data, offset => \$offset); # $obj2 = {obj=>2}; $offset = undef;
- json_format($scalar, %options)
-
Format a Perl value as JSON. With the default options, this function behaves roughly similar to:
JSON::PP->new->allow_nonref->core_bools->convert_blessed->encode($scalar);
Some modules escape the slash character in encoded strings to prevent a potential XSS vulnerability when embedding JSON inside
<script> .. </script>
tags. This function does not do that because it might not even be sufficient. The following is probably an improvement:json_format($data) =~ s{</}{<\\/}rg =~ s/<!--/<\\u0021--/rg;
This function generates invalid JSON if you pass it a string with invalid Unicode characters; I don't see how you'd ever accidentally end up with such a string, anyway.
The following
%options
are supported:- canonical
-
Boolean, write hash keys in deterministic (sorted) order. This option currently has no effect on tied hashes.
- pretty
-
Boolean, format JSON with newlines and indentation for easier reading. Beauty is in the eye of the beholder, this option currently follows the convention used by JSON::XS and others: 3 space indent and one space around the
:
separating object keys and values. The exact format might change in later versions. - utf8
-
Boolean, returns a UTF-8 encoded byte string instead of a Perl Unicode string.
- max_size
-
Maximum permitted size, in bytes, of the generated JSON string. Defaults to 1 GiB.
- max_depth
-
Maximum permitted nesting depth of Perl values. Defaults to 512.
(Why the hell yet another JSON codec when CPAN is already full of them!? Well, JSON::XS is pretty cool but isn't going to be updated to support Perl's new builtin booleans. JSON::PP is slow and while Cpanel::JSON::XS is perfectly adequate, its codebase is way too large and messy for what I need - it has too many unnecessary features and #ifdef
s to support ancient perls and esoteric configurations. Still, if you need anything not provided by these functions, JSON::PP and Cpanel::JSON::XS are perfectly fine alternatives. JSON::SIMD and JSON::Tiny also look like good and maintained candidates.)
URI-Related Functions
While URIs are capable of encoding arbitrary binary data, the functions below assume you're only dealing with text. This makes them more robust against weird inputs, at the cost of flexibility.
- utf8_decode($bytes)
-
Convert a (perl-UTF-8 encoded) byte string into a sanitized perl Unicode string. The conversion is performed in-place, so the
$bytes
argument is turned into a Unicode string. Returns the same string for convenience.This function throws an error if the input is not valid UTF-8 or if it contains ASCII control characters - that is, any character between
0x00
and0x1f
except for tab, newline and carriage return.(This is a tiny wrapper around
utf8::decode()
with some extra checks) - uri_escape($string)
-
Takes an Unicode string and returns a percent-encoded ASCII string, suitable for use in a query parameter.
- uri_unescape($string)
-
Takes an Unicode string potentially containing percent-encoding and returns a decoded Unicode string. Also checks for ASCII control characters as per
utf8_decode()
. - query_decode($string)
-
Decode a query string or
application/x-www-form-urlencoded
format (they're the same thing). Returns a hashref with decoded key/value pairs. Values for duplicated keys are collected into a single array value. Bare keys that do not have a value are decoded asbuiltin::true
. Example:my $hash = query_decode 'bare&a=1&a=2&something=else'; # $hash = { # bare => builtin::true, # a => [ 1, 2 ], # something => 'else' # }
The input
$string
is assumed to be a perl Unicode string. An error is thrown if the resulting data decodes into invalid UTF-8 or contains control characters, as perutf8_decode
. - query_encode($hashref)
-
The opposite of
query_decode
. Takes a hashref of similar structure and returns an ASCII-encoded query string. Keys withundef
orto_bool()
false values are omitted in the output.If a given value is a blessed object with a
TO_QUERY()
method, that method is called and it should return eitherundef
, a boolean or a string, which is then encoded.
HTTP Date Formatting
The HTTP date format is utter garbage, but with the right tools it doesn't require too much code to work with.
- httpdate_format($time)
-
Convert the given seconds-since-Unix-epoch
$time
into a HTTP date string. - httpdate_parse($str)
-
Converts the given HTTP date string into a seconds-since-Unix-epoch integer. This function is very strict about its input and only accepts "IMF-fixdate" as per RFC7231, which is what every sensible implementation written in the past decade uses.
This function plays fast and loose with timezone conversions, the parsed timestamp might be off by an hour or so for a few hours around a DST change. This will not happen if your local timezone is UTC.
File Descriptor Passing
UNIX sockets (see IO::Socket::UNIX) have the fancy property of letting you send file descriptors over them, allowing you to pass, for example, a socket from one process to another. This is a pretty low-level operation and not something you'll often need, but two functions to use that feature are provided here anyway because the FU supervisor uses them:
- fdpass_send($send_fd, $pass_fd, $message)
-
Send a message and a file descriptor (
$pass_fd
) over the given socket ($send_fd
).$message
must not be empty, even if you don't intend to do anything with it on receipt. Both$send_fd
and$pass_fd
must be numeric file descriptors, as obtained byfileno()
. - ($fd, $message) = fdpass_recv($recv_fd, $max_message_len)
-
Read a file descriptor and message from the given
$recv_fd
, which must be the numeric file descriptor of a socket. This function can be used as a replacement forsysread()
: the returned$fd
is undef if no file descriptor was received. The returned$message
is undef on error or an empty string on EOF.Like regular socket I/O, a single
fdpass_send()
message may be split across multiplefdpass_recv()
calls; in that case the$fd
is only received on the first call.Don't use this function if the sender may include multiple file descriptors in a single message, weird things can happen. File descriptors received this way do not have the
CLOEXEC
flag and will thus survive a call toexec()
. Refer to this wonderful discussion for more weirdness and edge cases.
See also IO::FDPass for a more portable solution, although that one does not support passing along regular data.
COPYRIGHT
MIT.
AUTHOR
Yorhel <projects@yorhel.nl>