NAME
Data::Tubes::Util
DESCRIPTION
Helper functions for automatic management of argument lists and other.
FUNCTIONS
args_array_with_options
my ($aref, $args) = args_array_with_options(@list, \%defaults); # OR
my ($aref, $args) = args_array_with_options(@list, \%args, \%defaults);
helper function to ease parsing of input parameters. This is mostly useful when your function usually takes a list as input, but you want to be able to provide an optional hash of arguments.
The function returns an array reference with the list of parameters, and a hash reference of arguments for less common things.
When calling this function, you are always supposed to pass a hash reference of options, which will act as a default. If the element immediately before is a hash reference itself, it will be considered the input for overriding arguments. Their combination (a simple overriding at the highest hash level) is then returned as $<$args>.
The typical way to invoke this function is like this:
function foo {
my ($list, $args) = args_array_with_options(@_, {bar => 'baz'});
...
}
so that the function foo
can be called with an optional trailing hash reference containing the arguments, like this:
foo(qw< this and that >, {bar => 'galook!'});
In case your list might actually contain hash references, you will have to take this into consideration.
assert_all_different
$bool = assert_all_different(@strings);
checks that all strings in @strings
are different. Returns 1
if the check is successful, throws an exception otherwise. The exception is a hash reference with a key message
set to the first string that is found repeated.
generalized_hashy
$outcome = generalized_hashy($text, %args); # OR
$outcome = generalized_hashy(%args); # OR
$outcome = generalized_hashy(\%args);
very generic parsing function that tries to figure out a hash out of an input text.
The default settings are optimezed for whipuptitude and DWIMmery. This means that a lot of strings that you would hardly consider sane are parsed anyway, just to give you something fast. If you need to be precise instead, you can either customize the different %args
, use a different parsing function or... roll your own.
The returned value is a hash with the following keys:
failpos
-
in case of failure, it reports the
pos
ition in the input text where the parsing was unsuccessful. It is absent when the parsing succeeds; failure
-
in case of failure, it reports an error message. It is absent when the parsing succeeds;
hash
-
the parsed hash. It is absent when the parsing fails;
pos
-
the position at which the parsing ended, because the "close" sequence was found;
res
-
the number of characters in the input text that were not parsed;
The model is the following:
the string is considered a sequence of chunks, optionally marked at the beginning by an
open
sequence, and at the end by aclose
sequence. Chunks are separated by a chunk separator;each chunk can be either a stand-alone value or a key/value pair. In the latter case, key and value are separated by a key-value separator
there is something that defines what a valid key and value looks like.
This gives you the following options via %args
:
capture
-
the regular expression that dominates all the other ones. You normally don't want to set it directly, but you can if you look at how the code uses it.
You can use this input argument using something that has already been compiled in a previous invocation of
generalized_hashy
, because it is returned at every invocation. So, the typical idiom for avoiding the recompilation of this regular expression every time is:# get the capture, set text to undef to avoid any parsing $args{capture} = generalized_hashy(undef, %args)->{capture};
From now on,
$args{capture}
contains the regular expression andgeneralized_hashy
will not need to compute it again when called with this%args
list.It has no default value.
chunks_separator
-
a regular expression for telling chunks apart. Defaults to:
chunks_separator => qr{(?mxs: \s* [\s,;\|/] \s*)}
i.e. it eats up surrounding spaces, and can be a space, comma, semicolon, pipe or slash character;
close
-
a regular expression for stating that the hash ends. Defaults to:
close => qr{(?mxs: \s*\z)}
i.e. it eats up optional trailing whitespace and expects to find the end of the string;
key
-
a regular expression for valid keys. This allows you to be quite precise as to what you admit for keys, but be sure to take a look at "key_admitted" below for a quicker way to set this parameter.
It does not have a default value as it relies upon "key_admitted"'s one.
key_admitted
-
a specification for valid, unquoted keys. When specifying this parameter and not setting a "key", the key is computed according to the algorithm explained below for admitted sequences.
This parameter can be either a regular expression, or a plain string containing the admitted characters. Defaults to:
key_admitted => qr{[^\\'":=\s,;\|/]};
i.e. whatever cannot fit in either separator.
key_decoder
-
a decoding function for a parsed key. You might want to set it when you allow quoting and/or escape sequences in your keys.
By default, it removes quotes and escaping characters related to "key_admitted";
key_default
default_key
-
a default key to use when there is a stand-alone value. The
default_key
variant is provided for compatibility with "metadata" and "hashy" in Data::Dumper::Plugin::Parser.When not set and a stand-alone value is found, the parsing fails and an error is returned.
There is no default. Note that this is different from the default setting/behaviour of "ghashy" in Data::Dumper::Plugin::Parser, although that function used
generalized_hashy
behind the scenes. Again, this is for similarity withhashy
and backwards compatibility. key_duplicated
-
a sub reference that will be called whenever a key is already present in the output hash. This allows you to e.g. complain loudly in case your input has a duplicated key.
By default, when a duplicate key is found for the first time the current value is transformed into an array reference whose first element is the old value and the second one is the new value. Any following value for that key is appended to the array;
key_value_separator
-
a regular expression for telling a key from a value. Defaults to:
key_value_separator => qr{(?mxs: \s* [:=] \s*)}
i.e. it eats up surrounding spaces, and can be a colon or an equal sign;
open
-
a regular expression for the hash beginning. Defaults to:
open => qr{(?mxs: \s* )}
i.e. it eats up optional leading whitespace;
pos
-
an integer value to set the initial position for parsing the input string. Default to 0, i.e. the start of the string;
text
-
the text to parse. This can also appear as the first unnamed parameter in the argument list;
value
-
a regular expression for valid values. This allows you to be quite precise as to what you admit for values, but be sure to take a look at "value_admitted" below for a quicker way to set this parameter.
It does not have a default value as it relies upon "value_admitted"'s one.
value_admitted
-
a specification for valid, unquoted values. When specifying this parameter and not setting a "value", the key is computed according to the algorithm explained below for admitted sequences.
This parameter can be either a regular expression, or a plain string containing the admitted characters. Defaults to:
value_admitted => qr{[^\\'":=\s,;\|/]};
i.e. whatever cannot fit in either separator.
value_decoder
-
a decoding function for a parsed value. You might want to set it when you allow quoting and/or escape sequences in your values.
By default, it removes quotes and escaping characters related to "value_admitted";
When using either "key_admitted" or "value_admitted", the "key" and "value" regular expressions will be computed automatically allowing for single and double quoted strings. This is what we refer to as admitted sequences. In this case, the admitted regular expression (we will call it $admitted
) is used as follows:
allowed_sequence => qr{(?mxs:
(?mxs:
(?: "(?: [^\\"]+ | \\. )*") # double quotes
| (?: '[^']*') # single quotes
)
| (?: (?: $admitted | \\.)+? ) # unquoted sequence, with escapes
)}
In case $admitted
is not a regular expression, it is transformed into one like this:
$admitted = qr{[\Q$admitted\E]}
i.e. it is considered a set of valid characters and transformed into a characters class.
One admitted sequence can then be either of the following:
- double-quoted
-
in this case, it is bound by double quotes characters, and can contain any character, including the double quotes themselves, by escaping using the backslash. As a matter of fact, every sequence of a backslash and a character is accepted whatever the second character is (including the backslash itself and the quoting character);
- single-quoted
-
in this case, it is bound by single quote characters, and can contain any character except the single quote itself. This differs from what Perl accepts in single-quoted strings, and is more in line with what happens in other languages (e.g. the shell);
- unquoted
-
in this case, no quotation character is considered, and the
$admitted
characters are used, with a twist: you can still escape otherwise invalid characters with the backslash.
If you don't like all this DWIMmery you can set "key" and "value" independently, of course.
Some examples are due. The following inputs all produce the same output in the default settings, ranging from mostly OK to definitely weird:
input text -> q< what:ever you:do >
input text -> q< what: ever you: do >
input text -> q< what: ever you= do | wow: yay >
input text -> q< what: ever , you= do | wow: yay >
output hash -> {what => 'ever', you => 'do', wow => 'yay'}
This shows you that you can do some escaping in the keys and values:
input text -> q< what: ever\ \"\,\"\ you\=\ do | wow: yay >
input text -> q< what: 'ever "," you= do' | wow: yay >
input text -> q< what: "ever \",\" you= do" | wow: yay >
output hash -> {what => 'ever "," you= do', wow => 'yay'}
load_module
my $module = load_module($locator); # OR
my $module = load_module($locator, $prefix);
loads a module automatically. There are a lot of modules on CPAN that do this, probably much better, but this should do for these module's needs.
The $locator
is resolved into a full module name through "resolve_module"; the resulting name is then require
d and the resolved name returned back.
Example:
my $module = load_module('Reader');
loads module Data::Tubes::Plugin::Reader and returns the string Data::Tubes::Plugin::Reader
, while:
my $other_module = load_module('Foo::Bar');
loads module Foo::Bar
and returns string Foo::Bar
.
You can optionally pass a $prefix
that will be passed to "resolve_module", see there for further information.
load_sub
my $sub = load_sub($locator); # OR
my $sub = load_sub($locator, $prefix);
loads a sub automatically. There are a lot of modules on CPAN that do this, probably much better, but this should do for these module's needs.
The $locator
is split into a pair of module and subroutine name. The module is loaded through "load_module"; the subroutine referenc3 is then returned from that module.
Example:
my $sub = load_module('Reader::by_line');
loads subroutine Data::Tubes::Plugin::Reader::by_line
and returns a reference to it, while:
my $other_sub = load_module('Foo::Bar::baz');
returns a reference to subroutine Foo::Bar::baz
after loading module Foo::Bar
.
You can optionally pass a $prefix
that will be passed to "resolve_module", see there for further information.
metadata
my $href = metadata($input, %args); # OR
my $href = metadata($input, \%args);
parse input string $string
according to rules exposed below, that can be controlled through %args
.
The string is split on the base of two separators, a chunks separator and a key/value separator. The first one isolates what should be key/value pairs, the second allows separating the key from the value in each of these chunks. Whenever a chunk is not actually a key/value pair, it is considered a value and associated to a default key.
The following items can be set in %args
:
chunks_separator
-
what allows separating chunks, it MUST be a single character;
default_key
-
a string used as the key when a chunk cannot be split into a pair;
key_value_separator
-
what allows separating the key from the value in a chunk, it MUST be a single character.
Examples:
# use defaults
my $input = 'foo=bar baz=galook booom!';
my $href = metadata($input);
# $href = {
# foo => 'bar',
# baz => 'galook',
# '' => 'booom!'
# }
# use defaults
my $input = 'foo=bar baz=galook booom!';
my $href = metadata($input, default_key => 'name');
# $href = {
# foo => 'bar',
# baz => 'galook',
# name => 'booom!'
# }
# use alternative separators
my $input = 'foo:bar & bar|baz:galook booom!|whatever';
my $href = metadata($input,
default_key => 'name',
chunks_separator => '|',
key_value_separator => ':'
);
# $href = {
# foo => 'bar & bar',
# baz => 'galook booom!',
# name => 'whatever'
# }
normalize_args
my $args = normalize_args( %args, \%defaults); # OR
my $args = normalize_args(\%args, \%defaults); # OR
my $args = normalize_args($value, %args, [\%defaults, $key]);
helper function to handle input parameters, with some defaults. Allows accepting both a series of key/value pairs, or a hash reference with these pairs, while at the same time providing default values.
A typical usage is as follows:
sub foo {
my $args = normalize_args(@_, {bar => 'baz'});
...
}
The last version allows you to accept an initial $value
without a key in your functions, because you pass the default $key
during the call to normalize_args
. A typical usage is as follows:
sub foo {
my $args = normalize_args(@_, [{bar => 'baz'}, 'aargh']);
...
}
In this case, you can accept calling foo
like this:
foo('some value', salutation => 'aloha');
and $args
will be populated as follows:
$args = {
aargh => 'some value', # thanks to the default $key
salutation => 'aloha', # passed as %args
bar => 'baz', # from defaults
};
normalize_filename
my $name_or_handle = normalize_filename($name, $default_handle);
helper function to normalize a file name according to some rules. In particular, depending on $filename
:
if it is a filehandle, it is returned directly;
if it is the string
-
, the$default_handle
is returned. This allows you to useSTDIN
orSTDOUT
as input/output handles in case the filename is-
(like many applications support);if it starts with the string
file:
, this prefix is stripped away and the rest is used as a filename. This allows you to actually use-
as a real file name, avoiding the automatic handle management described in the bullet above. If your filename may start with the stringfile:
, then you should always put this prefix, e.g.:file:whatever -- should be passed as --> file:file:whatever
if it starts with the string
handle:
, this prefix is stripped and the rest is used to get one of the standard filehandles. The allowed remaining parts are (case-insensitive):in
stdin
out
stdout
err
stderr
Any other remaining part causes an exception to be thrown.
Again, if you actually need to create a file whose name is e.g.
handle:whatever
, you have to prefix it withfile:
:handle:whatever -- should be passed as --> file:handle:whatever
otherwise, the provided
$filename
will be returned as-is.
pump
pull($iterator);
my $records = pull($iterator);
my @records = pull($iterator);
pull($iterator, $sink);
exhaust an $iterator
, depending on the conditions;
if a
$sink
is present, it MUST be a sub reference. For each item extracted from the iterator, this sub reference will be called with the items as argument;otherwise, if called in void context, the iterator is simply exhausted, without any kind of accumulation of the records generated;
otherwise, depending on scalar context or list context, an array reference or a list of generated records is returned.
read_file
my $contents = read_file($filename, %args); # OR
my $contents = read_file(%args); # OR
my $contents = read_file(\%args);
a slurping facility. The following options are available:
binmode
-
parameter for
CORE::binmode
, defaults to:encoding(UTF-8)
; filename
-
the filename (or reference to a string, if you really need it) to slurp data from.
You can optionally pass the filename standalone as the first argument without pre-pending it with the string filename
. In this case, it MUST appear as the first item in the argument list.
read_file_maybe
my $text = read_file_maybe(\@aref);
my $x = read_file_maybe($x); # where ref($x) ne 'ARRAY'
helper function that expands the input argument with "read_file" if it is an array reference, while returning the input argument unchanged otherwise.
This can be useful if you want to overload an input parameter with either a straight text or something that should be loaded from a file, like a template:
my $template = read_file_maybe($args{template});
In this case, if $args{template}
is a text, it will be returned unchanged. Otherwise, if it is an array reference, it will be expanded in a list passed to "read_file", and the contents of the file returned back.
Examples:
$text = read_file_maybe('this goes straight'); # direct text
# $text contains 'this goes straight' now
$text = read_file_maybe(['/path/to/text.txt']);
# $text has the contents of file /path/to/text.txt now
$text = read_file_maybe(['/path/to/text.txt', binmode => ':raw']);
# ditto, but read as raw text instead of default utf-8
resolve_module
my $full_module_name = resolve_module($module_name); # OR
my $full_module_name = resolve_module($module_name, $prefix);
possibly expand a module's name according to a prefix. These are the rules as of release 0.736
:
if
$module_name
starts with either a plus sign character+
or a caret character^
, this initial character will be stripped away and the rest will be used as the package name.$prefix
will be ignored in this case;otherwise,
${prefix}::${module_name}
will be returned (where$prefix
defaults to the stringData::Tubes::Plugin
).
The change is related to simplification of interface and better conformance to what other modules do in similar situations (principle of least surprise).
Examples:
module_name('^SimplePack'); # SimplePack
module_name('+Some::Pack'); # Some::Pack
module_name('SimplePack'); # Data::Tubes::Plugin::SimplePack
module_name('Some::Pack'); # Data::Tubes::Plugin::Some::Pack
module_name('Pack', 'Some::Thing'); # Some::Thing::Pack
module_name('Some::Pack', 'Some::Thing'); # Some::Thing::Some::Pack
API Versioning Note: behaviour of this function changed between version 0.734
and 0.736
. The previous behaviour, described below, is still available when $Data::Tubes::API_VERSION
(see "API Versioning" in Data::Tubes) is (lexicographically) less than, or equal to, 0.734
. Here's what the function does with the older interface:
if
$module_name
starts with an exclamation point!
, this initial character will be stripped away and the rest will be used as the package name.$prefix
will be ignored in this case;otherwise, if
$module_name
starts with a plus sign+
, this first character will be stripped away and the$prefix
will be used (defaulting toData::Tubes::Plugin
);otherwise, if
$module_name
does not contain sub-packages (i.e. the sequence::
), then the$prefix
will be used as in the previous bullet;otherwise, the provide name is used.
Examples (in the same order as the bullet above):
module_name('!SimplePack'); # SimplePack
module_name('+Some::Pack'); # Data::Tubes::Plugin::Some::Pack
module_name('SimplePack'); # Data::Tubes::Plugin::SimplePack
module_name('Some::Pack'); # Some::Pack
module_name('Pack', 'Some::Thing'); # Some::Thing::Pack
module_name('Some::Pack', 'Some::Thing'); # Some::Pack
shorter_sub_names
shorter_sub_names($package_name);
this helper is used in plugins to generate alternative versions of the implemented functions, with shorter names.
The basic rationale is that functions are usually named after the area they cover, e.g. the function in Data::Tubes::Plugin::Reader that reads a filehandle line-by-line is called read_by_line
. In this way, when you use e.g. summon
from Data::Tubes, you end up with a function read_by_line
that is much clearer than simply by_line
.
On the other hand, when you rely upon automatic running of factory functions like in tube
or pipeline
(again, in Data::Tubes), some parts are redundant. In the example, you would end up using Reader::read_by_line
, where read_
is actually redundant as you already have the last part of the plugin package name to tell you what this by_line
thing is about.
shorter_sub_names
comes to the rescue to generate alternative names by analysing the current namespace for a package and generating new functions by removing a prefix. In the Data::Tubes::Plugin::Reader case, for example, it is called like this at the end of the module:
shorter_sub_names(__PACKAGE__);
and it generates, among the others, by_line
and by_paragraph
.
Consider using this if you generate new plugins.
sprintffy
my $string = sprintffy($template, \@substitutions);
expand a $template
string a-la sprintf
, based on a list of @substitutions
.
The template targets are sprintf
-like, i.e. sequences that start with a percent sign followed by... something.
Each substitution is supposed to be an array reference with two items inside: a regular expression and a value specifier. The regular expression is used to match what comes after the percent sign, while the value part can be either a straight value, or a subroutine reference that will be run to get the real value for the substitution.
There is always an implicit, high priority substitution that matches a single percent sign and expands to a percent sign, so that the string %%
will be unescaped to %
as you would expect in something that is sprintf
-like.
test_all_equal
my $bool = test_all_equal(@list);
test whether all elements in @list
are equal to one another or not, and return test output as a boolean value (i.e. something that Perl considers true or false).
trim
trim(@strings);
remove leading/trailing whitespaces from input @strings
, in-place.
traverse
my $item = traverse($data, @keys);
Assuming that $data
is an array or hash reference, traverse it using items in @keys
at each step in the descent.
tube
see tube
in Data::Tubes, this is the same function.
unzip
my ($even, $odds) = unzip(@list); # OR
my ($even, $odds) = unzip(\@list);
separates even and odd items in the input @list
and returns them as two references to arrays.
SEE ALSO
Data::Tubes is a valid entry point of all of this.
AUTHOR
Flavio Poletti <polettix@cpan.org>
COPYRIGHT AND LICENSE
Copyright (C) 2016 by Flavio Poletti <polettix@cpan.org>
This module is free software. You can redistribute it and/or modify it under the terms of the Artistic License 2.0.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.