NAME
Data::Rlist - A lightweight data language for Perl, C and C++
SYNOPSIS
use Data::Rlist;
.
.
Data from text:
$string_ref = Data::Rlist::write_string($data);
$string = Data::Rlist::make_string($data);
$data = Data::Rlist::read_string($string_ref);
$data = Data::Rlist::read_string($string);
Data from files:
Data::Rlist::write($data, $filename);
$data = Data::Rlist::read($filename);
Perform safe deep copies of data:
$deep_copy = Data::Rlist::keelhaul($data);
The same can be achieved with the object-oriented interface:
$object = new Data::Rlist(-data => $thing, -output => \$target_string)
-data defines the data to be compiled, and -output where to write the compilation. -output either defines a string reference or the name of a file:
$string_ref = $object->write; # compile $thing, return \$target_string
$object->set(-output => "$HOME/.foorc"); # refine as filename
$object->write; # write "~/.foorc"
Passing an argument to write() eventually overrides -output:
$object->write(".barrc"); # write to some other file
"write_string()" and "make_string()" make up a string out of thin air, no matter how -output is set:
$string_ref = $object->write_string; # write to new string (ignores -output)
$string = $object->make_string; # dto. but return string value
print $object->make_string; # ...dump $thing to stdout
However, all these functions apply -data as the Perl data to be compiled. The attribute -input defines what to parse: read() compiles the text defined by -input back to Perl data:
$object->set(-input => \$rlist_language_productions);
$data = $object->read;
$data = $object->read($other); # overrides -input attribute
Analog to -data the -input attribute shall be either a string-reference, undef or the name of a file:
use Env qw/HOME/;
$object->set(-input => "$HOME/.foorc");
$data = $object->read; # open and parse "~/.foorc"
$data = $object->read(".barrc"); # parse some other file (override -input)
$data = $object->read(\$string); # parse some string (override -input)
$data = $object->read_string($string_or_ref); # dto.
KEELHAULING DATA
Data::Rlist can also create deep-copies of Perl data, a functionality called keelhauling:
$deep_copy = $object->keelhaul; # create in-depth copy of $thing
The metaphor vividly connotes that $thing is stringified, then compiled back. See "keelhaul()" for why this only sounds useless. The little brother of "keelhaul()" is "deep_compare()":
print join("\n", Data::Rlist::deep_compare($a, $b));
DESCRIPTION
Venue
Random-Lists (Rlist) is a tag/value format for text data. It converts objects into legible, plain text. Rlist is a data format language that uses lists of (a) values and (b) tags and values to structure data. Shortly, to stringify objects. The design targets the simplest (yet complete) language for constant data:
- it allows the definition of hierachical data,
- it disallows recursively-defined data,
- it does not consider user-defined types,
- it has no keywords,
- it has no arithmetic expressions,
- it uses 7-bit-ASCII character encoding.
Rlists are not Perl syntax, and can be used also from C and C++ programs.
RLIST PERL
----- ----
5; { 5 => undef }
"5"; { "5" => undef }
5=1; { 5 => 1 }
{5=1;} { 5 => 1 }
(5) [ 5 ]
{} { }
; { }
() [ ]
- Strings and Numbers
-
"Hello, World!"
Symbolic names are simply strings consisting only of [a-zA-Z_0-9-/~:.@] characters. For such strings the quotes are optional:
foobar cogito.ergo.sum Memento::mori
Numbers adhere to the IEEE 754 syntax for integer- and floating-point numbers:
38 10e-6 -.7 3.141592653589793
- Array
-
Arrays are sequential lists:
( 1, 2, ( 3, "Audiatur et altera pars!" ) )
- Hash
-
Hashes map a key scalar to some value, a subsquent Rlist. Hashes are associative lists:
{ key = value; 3.14159 = Pi; "Meta-syntactic names" = (foo, bar, baz, "lorem ipsum", Acme, ___); lonely-key; }
Audience
Rlist is useful as a "glue data language" between different systems and programs, for configuration files and for persistence layers (object storage). It attempts to represent the data pure and untinged, but without breaking its structure or legibility. The format excels over comma-separated values (CSV), but isn't as excessive as XML:
Like CSV the format describes merely the data itself, but the data may be structured in multiple levels, not just lines.
Like XML data can be as complex as required, but while XML is geared to markup data within some continuous text (the document), Rlist defines the pure data structure. However, for non-programmers the syntax is still self-evident.
Rlists are built from only four primitives: number, string, array and hash. The penalty with Rlist hence is that data schemes are tacit consents between the users of the data (the programs).
Implementations yet exist for Perl, C and C++, Windows and UN*X. These implementations are stable, portable and very fast, and they do not depend on other software. The Perl implementation operates directly on primitive types, where C++ uses STL types. Either way data integrity is guaranteed: floats won't loose their precision, Perl strings are loaded into std::strings, and Perl hashes and arrays resurrect in as std::maps and std::vectors.
Moreover, a design goal of Rlist was to scale perfectly well: a single text files can express hundreds of megabytes of data, while the data is readable in constant time and with constant memory requirements. This makes Rlist files applicable as "mini-databases" loaded into RAM at program startup. For example, http://www.sternenfall.de uses Rlist instead of a MySQL database.
Number and String
All program data is finally convertible into numbers and strings. In Rlist number and string constants follow the C language lexicography. Strings that look like C identifier names must not be quoted.
By definition all input is compiled into an array or hash; hashes are the default. For example, the string "Hello, World!"
is compiled into:
{ "Hello, World!" => undef }
Likewise the parser of the C++ implementation by default returns a std::map with one pair. The default scalar value is the empty string ""
. In Perl, undef'd list elements are compiled into ""
.
Strings are quoted implicitly when building Rlists; when reading them back strings are unquoted. Quoting means to encode characters, then wrap the string into "
. You can can also make use of this functionality by calling "quote()" and "unquote()" as separate functions.
Here Documents
Rlist is capable of a line-oriented form of quoting based on the UNIX shell here-document syntax and RFC 111. Multi-line quoted strings can be expressed with
<<DELIMITER
Following the sigil << an identifier specifies how to terminate the string scalar. The value of the scalar will be all lines following the current line down to the line starting with the delimiter. There must be no space between the << and the identifier. For example,
{
var = {
log = {
messages = <<LOG;
Nov 27 21:55:04 localhost kernel: TSC appears to be running slowly. Marking it as unstable
Nov 27 22:34:27 localhost kernel: Uniform CD-ROM driver Revision: 3.20
Nov 27 22:34:27 localhost kernel: Loading iSCSI transport class v2.0-724.<6>PNP: No PS/2 controller found. Probing ports directly.
Nov 27 22:34:27 localhost kernel: wifi0: Atheros 5212: mem=0x26000000, irq=11
LOG
};
};
}
Character Encoding
Rlist text uses 7-bit-ASCII. The 95 printable character codes 32 to 126 occupy one character. Codes 0 to 31 and 127 to 255 require four characters each: the \ escape character followed by the octal code number. For example, the German Umlaut character ü (252) is translated into \374. An exception are codes 93 (backslash), 34 (double-quote) and 39 (single-quote), which are escaped as
\\ \" \'
Binary Data
Binary data can be represented as base64-encoded string or here-document.
Embedded Perl Code
Rlists may define embedded programs: nanonscripts. They're defined as here-document that is delimited with the special string "nanoscript". For example,
hello = (<<nanoscript);
print "Hello, World!";
nanoscript
After the Rlist has been fully parsed such strings are eval'd in the order of their occurrence. Within the eval %root or @root defines the root of the current Rlist.
Comments
Rlist supports multiple forms of comments: // or # single-line-comments, and /* */ multi-line-comments.
EXAMPLES
Basic Rlist values are number and string constants, from which larger structures are built. All of the following paragraphs define valid Rlists.
Single strings and numbers:
"Hello, World!"
foo // compiles to { 'foo' => undef }
3.1415 // compiles to { 3.1415 => undef }
Array:
(1, a, 4, "b u z") // list of numbers/strings
((1, 2),
(3, 4)) // list of list (4x4 matrix)
((1, a, 3, "foo bar"),
(7, c, 0, "")) // another list of lists
Array of strings:
warning = (
"main correlation-matrix not positive-definite",
"using pseudo-decomposed sigma-matrix",
"cannot evaluate CVaR: the no. of simulations is to low for confidence-level 0.90"
);
Configuration object as hash:
{
contribution_quantile = 0.99;
default_only_mode = Y;
importance_sampling = N;
num_runs = 10000;
num_threads = 10;
# etc.
}
A comprehensive example:
"Metaphysic-terms" =
{
Numbers =
{
3.141592653589793 = "The ratio of a circle's circumference to its diameter.";
2.718281828459045 = <<___;
The mathematical constant "e" is the unique real number such that the value of
the derivative (slope of the tangent line) of f(x) = e^x at the point x = 0 is
exactly 1.
___
42 = "The Answer to Life, the Universe, and Everything.";
};
Words =
{
ACME = <<Value;
A Company [that] Makes Everything: Wile E. Coyote's supplier of equipment and gadgets.
Value
<<Key = <<Value;
foo bar foobar
Key
[JARGON] A widely used meta-syntactic variable; see foo for etymology. Probably
originally propagated through DECsystem manuals [...] in 1960s and early 1970s;
confirmed sightings go back to 1972. [...]
Value
};
};
PACKAGE DETAILS
Compile Options
The format of the compiled text and the behavior of "compile()" can be controlled by the OPTIONS parameter of "write()", "write_string()" etc. The argument is a hash defining how the Rlist text shall be formatted. The following pairs are recognized:
- 'precision' => NUMBER
-
Unless NUMBER undef round all numbers to the decimal places NUMBER by calling "round()". By default NUMBER is undef, so "compile()" does not round floats.
- 'scientific' => FLAG
-
Causes compile() to masquerade $Data::Rlist::RoundScientific; see "round()" for the implications. Alternately the -RoundScientific object attribute can be set; see "new()".
- 'code_refs' => FLAG
-
If enabled and "write()" encounters a CODE reference, calls the code, then compiles the return value. Disabled by default.
- 'threads' => COUNT
-
If enabled "compile()" internally use multiple threads. Note that this makes only sense on machines with at least COUNT CPUs.
- 'here_docs' => FLAG
-
If enabled strings with at least two newlines in them are written in the here-doc-format. Note that the string has to be terminated with a
"\n"
to qualify as here-document. - 'outline_data' => NUMBER
-
Use
"eol"
(linefeed) to "distribute data on many lines." Insert a linefeed after every NUMBERth array value; 0 disables outlining. - 'outline_hashes' => FLAG
-
If enabled, and
"outline_data"
also is also enabled, prints { and } on distinct lines when compiling Perl hashes with at least one pair. - 'comma' => STRING
-
The comma-separator string to be used by "write_csv()". The default is
','
. - 'delimiter' => STRING-OR-REGEX
-
Field-delimiter for "read_csv()". The default is
'\s*,\s*'
.
The following options format the generated Rlist; normally you don't want to modify them:
- 'bol_tabs' => COUNT
-
Count of physical, horizontal TAB characters to use at the begin-of-line per indentation level. Defaults to 1. Note that we don't use blanks, because they blow up the size of generated text without measure.
- 'eol_space' => STRING
-
End-of-line string to use (the linefeed). For example, legal values are
""
," "
,"\r\n"
etc. The default is"\n"
. - 'paren_space' => STRING
-
String to write after ( and {, and before } and ) when compiling arrays and hashes.
- 'comma_punct' => STRING
- 'semicolon_punct' => STRING
-
Comma and semicolon strings, which shall be at least
","
and";"
. No matter what, "compile()" will always print the"eol"
string after the"semicolon"
string. - 'assign_punct' => STRING
-
String to combine key/value-pairs. Defaults to
" = "
. Shall be at least"="
to not violate the compiled Rlist.
Predefined Options
The OPTIONS parameter accepted by some package functions is either a hash-ref or the name of a predefined set:
- 'default'
-
Default if writing to a file.
- 'string'
-
Compact, no newlines/here-docs. Renders a "string of data".
- 'outlined'
-
Optimize the compiled Rlist for maximum readability.
- 'squeezed'
-
Very compact, no whitespace at all. For very large Rlists.
- 'perl'
-
Compile data in Perl syntax, using "compile_Perl()", not "compile()".
- 'fast' or undef
-
Compile data as fast as possible, using compile_fast(), not compile().
All functions that define an OPTIONS parameter implicitly call "complete_options()" to complete it from one of the predefined set, and "default"
. Therefore you may just define a "lazy subset of options" to these functions. For example,
my $obj = new Data::Rlist(-data => $thing);
$obj->write('thing.rls', { scientific => 1, precision => 8 });
See also "complete_options()", "predefined_options()" and :options.
Debugging Data (Finding Self-References)
Debugging (hierachical) data means breaking recursively-defined data.
Set $Data::Rlist::MaxDepth to an integer above 0 to define the depth under which "compile()" shall not venture deeper. 0 disables debugging. When positive compilation breaks on deep recursions caused by circular references, and on stderr a message like the following is printed:
ERROR: compile2() broken in deep ARRAY(0x101aaeec) (depth = 101, max-depth = 100)
The message will also be repeated as comment when the compiled Rlist is written to a file. Furthermore $Data::Rlist::Broken is incremented by one - and compilation continues! So, any attempt to venture deeper as suggested by $Data::Rlist::MaxDepth in the data will be blocked, but compilation continues above that depth. After "write()" or "write_string()" returned, the caller can check whether $Data::Rlist::Broken is not zero. Then not all of the data was compiled into text.
Quoting strings that look like numbers
Normally you don't have to care about strings, since un/quoting happens as required when reading/compiling Rlists from Perl data. A common problem, however, occurs when some text fragment (string) uses the same lexicography than numbers do.
Printed text uses well-defined glyphs and typographic conventions, and finally the competence of the reader to recognize numbers. But computers need to know the exact number type and format to recognize numbers. Integer? Float? Hexadecimal? Scientific? Klingon? The Perl Cookbook in recipe 2.1 recommends the use of a regular expression to distinguish number from string scalars. The advice illustrates how hard the problem actually is. Not only Perl has to come over this; any program that interprets text has to.
Since Perl scripts are texts that process text into more text, Perl's artful answer was to define typeless scalars. Scalars hold a number, a string or a reference. Therewith Perl solves the problem that digits, like alphabetics and punctuations, are regular ASCII codes. So Perl defines the string as the basic building block for all program data. Venturesome it then lets the program decide what strings mean. Analogical, in a printed book the reader has to decipher the glyphs and decide what evidence they hide.
In Rlist, string scalars that look like numbers need to be quoted explicitly. Otherwise, for example, the scalar $s="-3.14"
appears as -3.14 in the output. Likewise "007324"
is compiled into 7324 - the text quality is lost and the scalar is read back as a number. Of course, this behavior is by intend, and in most cases this is just what you want. For hash keys, however, it might be a problem. One solution is to prefix the string by an artificial "_"
:
my $s = '-9'; $s = "_$s";
Since the scalar begins with a "_"
it does not qualify as a number anymore, and hence is compiled as string, and read back as string. In the C++ implementation it will then become std::string, not a double. But the leading "_"
has to be removed by the reading program, which debunks this technique as a rather poor hack. Perhaps a better solution is to explicitly call Data::Rlist::quote:
$k = -9;
$k = Data::Rlist::quote($k); # returns qq'"-9"'
use Data::Rlist qw/:strings/;
$k = 3.14_15_92;
$k = quote($k); # returns qq'"3.141592"'
Again, the need to quote strings that look like numbers is a problem evident only in the Perl implementation of Rlist, since Perl is a language with weak types. As a language with very strong typing, C++ is quasi the antipode to Perl. With the C++ implementation of Rlist then there's no need to quote strings that look like numbers.
See also "write()", "is_numeric()", "is_name()", "is_random_text()" and http://en.wikipedia.org/wiki/American_Standard_Code_for_Information_Interchange.
Speed-up Compilation
Much work has been spent to optimize Data::Rlist for speed. Still it is implemented in pure Perl (no XS). A very rough estimate for Perl 5.8 is "each MB takes one second per GHz". For example, when the resulting Rlist file has a size of 13 MB, compiling it from a Perl script on a 3-GHz-PC requires about 5-7 seconds. Compiling the same data under Solaris, on a sparcv9 processor operating at 750 MHz, takes about 18-22 seconds.
Explicit Quoting
The process of compiling can be speed up by calling "quote()" explicitly on scalars. That is, before calling "write()" or "write_string()". Large data sets may compile faster when for scalars, that certainly not qualify as symbolic name, "quote()" is called in advance:
use Data::Rlist qw/:strings/;
$data{quote($key)} = $value;
.
.
Data::Rlist::write("data.rlist", \%data);
instead of
$data{$key} = $value;
.
.
Data::Rlist::write("data.rlist", \%data);
It depends on the case whether the first variant is faster: "compile()" and "compile_fast()" both have to call "is_random_text()" on each scalar. When the scalar is already quoted, i.e. its first character is "
, this test ought to run faster.
Note that internally "is_random_text()" applies the precompiled regex $g_re_value. But for a given scalar $s the expression
($s !~ $Data::Rlist::g_re_value)
can be up to 20% faster than the equivalent is_random_text($s).
PACKAGE FUNCTIONS
Construct Objects
new(), get() and set()
These are the core functions to cultivate package objects.
The following functions may be called also as methods: "read()", "read_csv()", "read_string()", "write()", "write_string()" and "keelhaul()".
- new(ATTRIBUTES)
-
ATTRIBUTES is a hash-table defining object attributes. Example:
$self = Data::Rlist->new(-input => "foo.rlist", -data => $thing);
REGULAR ATTRIBUTES
-input => INPUT
Defines what to parse. INPUT defines a filename or string reference. Applied by "read()", "read_csv()" and "read_string()".
-data => DATA
Defines the data to be compiled. DATA is some Perl data. Applied by "write()", "write_string()" and "keelhaul()".
-output => OUTPUT (optional)
Defines where to put the compilation: either a filename, string-reference or undef.
-filter => FILTER (optional) -filter-args => FILTER-ARGS (optional)
Used by "read()" as the preprocessor on the input file. Then applied before parsing. FILTER can be 1 to select the standard C preprocessor cpp. Applied by "open_input()".
-delimiter => DELIMITER (optional)
See "read_csv()".
-options => OPTIONS (optional)
Defines the compile options.
-header => STRINGS (optional) -columns => STRINGS (optional)
Defines the header text (the comments) for data written to files, and the column names of CSV files. Used in place of the HEADER parameter of "write()" and COLUMNS of "write_csv()".
ATTRIBUTES THAT MASQUERADE PACKAGE GLOBALS
These attributes raise new values for package globals while object methods are executed. The new values are provided by an object that therewith locks the package (in which case $Data::Rlist::Locked is true.) When the method returns the previous globals are restored.
-SafeCppMode => FLAG (optional)
Used by "read()" to masquerade $Data::Rlist::SafeCppMode.
-MaxDepth => INTEGER (optional)
Used by "write()" to masquerade $Data::Rlist::MaxDepth.
-RoundScientific => FLAG (optional)
Used by "round()" during compilation. Masquerades $Data::Rlist::RoundScientific. Note that round() is only called when the
"precision"
option is defined. - set(SELF[, ATTRIBUTES])
-
Reset or initialize object attributes (see "new()"). Returns SELF. Example:
$obj->set(-input => \$str, -output => 'temp.rls', -options => 'squeezed');
- get(SELF, NAME[, DEFAULT])
-
Get some object attribute. For NAME the leading hyphen is optional. Unless NAME exists as an attribute returns DEFAULT, or undef.
EXAMPLES
$self->get('foo'); # returns $self->{-foo} or undef $self->get(-foo=>); # dto. $self->get('foo', 42); # returns $self->{-foo} or, unless exists, 42
Interface
Public functions to be called by users of the package.
read(), read_csv() and read_string()
- read(INPUT[, FILTER, FILTER-ARGS])
-
Parse data structure from INPUT.
PARAMETERS
INPUT shall be either
- some Rlist object created by "new()",
- a string reference, in which case read() and "read_string()" parse Rlist text from it,
- a string scalar, in which case read() assumes a file to open and to parse.
See "open_input()" for details on the FILTER and FILTER-ARGS parameters, which are used to preprocess input files before actually reading them. When specified, and INPUT is an object, they overload the -filter and -filter-args attributes.
When the input file cannot be open'd and flock'd this function dies. Note that die is Perl's mechanism to raise exceptions; they can be catched with eval. For example,
my $host = eval { use Sys::Hostname; hostname; } || 'some unknown machine';
This code fragment traps the die exception; when it was raised eval returns undef, otherwise the result of calling hostname. For read this means
$data = eval { Data::Rlist::read($tempfile) }; print STDERR "$tempfile not found, is locked or is empty" unless defined $data;
RESULT
"read()" returns parsed data (reference) or undef if there was no data (when the length of the physical file is greater than zero it had only comments/whitespace).
See also "parse()", "write()", "write_string()".
- read_csv(INPUT[, OPTIONS, FILTER, FILTER-ARGS])
-
PARAMETERS
See "read()" for INPUT and "open_input()" for FILTER and FILTER-ARGS (open_input() is called internally). For example, simply pass 1 as FILTER argument to read INPUT through the standard C preprocessor, instead of reading it directly.
The comma-delimiter is read from OPTIONS, as
"delimiter"
. It defaults to '\s*,\s*'.RESULT
Returns a list of lists. In list context a list of array references, in scalar context a reference to such a list. Each embedded array defines the fields in a line, and may be of variable length.
- read_string(INPUT)
-
Calls "read()" to read Rlist language productions from the string or string-reference INPUT.
write(), write_csv() and write_string()
- write(DATA[, OUTPUT, OPTIONS, HEADER])
-
Translates Perl data into some Rlist, i.e. into printable text. DATA is either an object generated by "new()", or some Perl data, or undef. write() is auto-exported as "WriteData()".
PARAMETERS
When DATA is an object the Perl data to be compiled is defined by the -data attribute. (When -data refers to another Rlist object, this other object is invoked.) Otherwise DATA defines the data to be compiled.
Optional OUTPUT defines where to compile to. Defaults to the -output attribute when DATA defines some Data::Rlist object. Defines a filename to create, or some string-reference. When undef writes to some anonymous string.
Optional OPTIONS arguments defines how to compile text from DATA. Defaults to the -options attribute when DATA is an object. When uses "compile_fast()", otherwise "compile()".
Optional HEADER is a reference to an array of strings that shall be printed literally at the top of an output file. Defaults to the -header attribute when DATA is an object.
RESULT
When write() creates a file it returns 0 for failure or 1 for success. Otherwise it returns a string reference.
EXAMPLES
$self = new Data::Rlist(-data => $thing, -output => $output); $self->write; # Write into some file (if $output is a filename) or string (if $output is a # string reference). Data::Rlist::write($thing, $output); # dto. applying the functional interface new Data::Rlist(-data => $self)->write; # Another way to do it. print $self->make_string; # Print $thing to stdout. print Data::Rlist::make_string($thing); # dto. applying the functional interface
- write_csv(DATA[, OUTPUT, OPTIONS, COLUMNS])
-
Write DATA as CSV to file or string OUTPUT.
This function automatically quotes all fields that do not look like numbers (see "is_numeric()"). Numbers are rounded to the specified precision.
write_csv() is auto-exported as "WriteCSV()".
PARAMETERS
See "write()" for the DATA and OUTPUT parameters, which are semantically equal. From OPTIONS is read the comma-separator (
"comma"
, default is","
), the linefeed ("eol_space"
, default is"\n"
) and the numeric precision ("precision"
).COLUMNS, if specified, shall be an array-ref defining the column names to be written as the first line.
Like with "write()", unless DATA refers to some Data::Rlist object, it shall define the data to be compiled. But because of the limitations of CSV files the data may not be just any Perl data. It must be a reference to an array of array references, where each contained array defines the fields, e.g.
[ [ a, b, c ], # line 1 [ d, e, f, g ], # line 2 . . ]
RESULT
When write_csv() creates a file it returns 0 for failure or 1 for success. Otherwise it returns a string reference.
EXAMPLES
Functional interface:
use Data::Rlist; # imports WriteCSV WriteCSV($thing, "foo.dat"); WriteCSV($thing, "foo.dat", { comma => '; ' }, [qw/GBKNR VBKNR EL LaD LaD_V/]); WriteCSV($thing, \$target_string); $target_string_ref = WriteCSV($thing);
Object-oriented interface:
$object = new Data::Rlist(-data => $thing, -output => "foo.dat", -options => { comma => '; ' }, -columns => [qw/GBKNR VBKNR EL LaD LaD_V/]); $object->write_csv; # Write $thing as CSV to foo.dat $object->write; # Write $thing as Rlist to foo.dat $object->set(-output => \$target_string); $object->write_csv; # Write $thing as CSV to $target_string
- write_string(DATA[, OPTIONS])
-
Like "write()" but always compiles to a new string to which it returns a reference. In an object this function does not use -output, even when this attribute defines a string reference. It also won't use -options. Instead it uses the predefined options set
"string"
to render a very compact Rlist without newlines and here-docs.
make_string() and keelhaul()
- make_string(DATA[, OPTIONS])
-
Print Perl DATA to a string and return its value. This function actually is an alias for
${Data::Rlist::write_string(DATA, OPTIONS)}
OPTIONS default to
"default"
, which means that in an object context make_string() will never use the -options attribute.EXAMPLES
print "\n\$data: ", Data::Rlist::make_string($data); $self = new Data::Rlist(-data => $thing); print "\n\$thing: ", $self->make_string;
- keelhaul(DATA[, OPTIONS])
-
Do a deep copy of DATA according to OPTIONS. DATA is some Perl data, or some Data::Rlist object.
keelhaul() works by first compile DATA to text, then restoring the data from the text. The text had been carefully built according to certain "Compile Options". Hence, by "keelhauling data", one can adjust the accuracy of numbers, break circular-references and drop \*foo{THING}s.
EXAMPLES
When keelhaul() is called in an array context it also returns the text from which the copy had been built:
$deep_copy = Data::Rlist::keelhaul($thing); ($deep_copy, $rlist_text) = Data::Rlist::keelhaul($thing); $deep_copy = new Data::Rlist(-data => $thing)->keelhaul;
Bring all numbers in DATA to a certain accuracy:
$thing = { foo => [.00057260, -1.6804e-4] }; $deep_copy = Data::Rlist::keelhaul($thing, { precision => 4 });
which copies $thing into
{ foo => [0.0006, -0.0002] }
All number scalars where rounded to 4 decimal places, so they're finally comparable as floating-point numbers (see "equal()" for a discussion), One can also convert all floats to integers:
$self = Data::Rlist->new(-data => $thing); $deep_copy = $self->keelhaul({precision => 0});
NOTES
It was said before that keelhauling is a working method to create a deep copy of Perl data. keelhaul() won't throw die nor return an error, but be prepared for the following effects:
ARRAY, HASH, SCALAR and REF references were compiled, whether blessed or not. Depending on the compile options CODE references were called, deparsed back into their function bodies, or dropped.
IO, GLOB and FORMAT references have been converted into their plain typenames (see "compile()").
undef'd array elements had been converted into the default scalar value
""
.Compile options are considered, such as implicit rounding of floats.
Anything deeper than $Data::Rlist::MaxDepth is thrown away (again, see "compile()").
Since compiling does not store type information, keelhaul() will turn blessed references into barbars again. No special methods to "freeze" and "thaw" an object is called before compiling or after parsing it. Instead the copy is a copy made from what any object in a computer ultimately consists of: strings and numbers.
- predefined_options([PREDEF-NAME])
-
Get %Data::Rlist::PredefinedOptions{PREDEF-NAME}. PREDEF-NAME defaults to
"default"
, the options for writing files. - complete_options([OPTIONS[, PREDEF-NAME]])
-
Completes OPTIONS (hash or name) using the predefined set PREDEF-NAME. (defaults to
"default"
). For example,complete_options({ precision => 0 }, 'squeezed')
combines the predefined options for
"squeezed"
text (no whitespace at all, no here-docs, numbers are rounded to a precision of 6) with a numeric precision of 0. This converts all floats to integers.Returns a reference to a new hash of "Compile Options".
Implementation
open_input() and close_input()
- open_input(INPUT[, FILTER, FILTER-ARGS])
- close_input()
-
Open/close Rlist text file or string INPUT for parsing. Used internally by "read()" and "read_csv()".
PREPROCESSING
If specified the function preprocesses the INPUT file using FILTER, before actually reading the file. Use the special value 1 for FILTER to select the default C preprocessor (precisely, gcc -E -Wp,-C). FILTER-ARGS is an optional string of additional command-line arguments appended to FILTER. For example,
my $foo = read("foo", 1, "-DEXTRA")
eventually does not parse foo, but the output of the command
gcc -E -Wp,-C -DEXTRA foo
Hence within foo C-preprocessor-statements are allowed:
{ #ifdef EXTRA #include "extra.rlist" #endif 123 = (1, 2, 3); foobar = { . .
SAFE CPP MODE
This slightly esoteric mode involves sed and a temporary file. It is enabled by setting $Data::Rlist::SafeCppMode to 1 (the default). It protects single-line #-comments when FILTER begins with either gcc, g++ or cpp. "open_input()" then additionally runs sed to convert all input lines beginning with whitespace plus the # character. Only the following cpp-commands are excluded, and only when they appear in column 1:
- #include and #pragma
- #define and #undef
- #if, #ifdef, #else and #endif.
For all other lines sed converts # into ##. This prevents the C preprocessor from evaluating them. But because of Perl's limited open() function, which isn't able to open arbitary pipes, the invocation of sed requires a temporary file. The file is simply created by appending
".tmp"
to the pathname passed in INPUT. "lexln()", the function that feeds the lexical scanner with lines, then converts ## back into comment lines.Alternately, use // and /* */ comments and set $Data::Rlist::SafeCppMode to 0.
lex() and parse()
- lex()
-
Lexical scanner. Called by "parse()" to split the current line into tokens. lex() reads # or // single-line-comment and /* */ multi-line-comment as regular white-spaces. Otherwise it returns tokens according to the following table:
RESULT MEANING ------ ------- '{' '}' Punctuation '(' ')' Punctuation ',' Operator ';' Punctuation '=' Operator 'v' Constant value as number, string, list or hash '??' Error undef EOF
lex() appends all here-doc-lines with a newline character. For example,
<<test1 a b test1
is effectively read as
"a\nb\n"
, which is the same value as the equivalent here-doc in Perl has. Hence the purpose of the last character (the newline in the last line) is not just to separate the last line from the delimiter. As a consequence, not all strings can be encoded as a here-doc. For example, it might not be quite obvious to many programmers that"foo\nbar"
has no here-doc-equivalent. - lexln()
-
Read the next line of text from the input. Return 0 if "at_eof()", 1 otherwise.
- at_eof()
-
Return true if current input file / string array is exhausted, false otherwise.
- parse()
-
Read Rlist language productions from current input, defined by package variables. This is a fast, non-recursive parser driven by the parser map %Data::Rlist::Rules. See also "lex()".
errors(), broken() and missing_input()
- errors([SELF])
-
Returns the number of syntax errors that occurred in the last call to "parse()". When called as method (i.e. SELF is defined) returns the number of syntax errors that occured for the last time an object had called "read()".
- broken([SELF])
-
Return the number of times the last "compile()" crossed the zenith of $Data::Rlist::MaxDepth. When called as method returns the information for the last time an object had called "read()".
- missing_input([SELF])
-
Return true when the last call to "parse()" yielded undef because there was nothing to parse. Otherwise, when parse() returned undef, this means there was some syntax error. parse() is called internally by "read()". When called as method returns the information for the last time an object had called "read()".
compile()
- compile(DATA[, OPTIONS, FH])
-
Build Rlist from DATA. DATA is a Perl scalar as number, string or reference. When FH is defined compile directly to this file and return 1. Otherwise (FH is undef) build a string and return a reference to it.
- Reference-types SCALAR, HASH, ARRAY and REF.
-
Compiled into text, whether blessed or not.
- Reference-types CODE
-
How CODE references are compiled depends on the
"code_refs"
flag defined by OPTIONS. Legal values are undef,"call"
(the default) and"deparse"
.When
"code-ref"
's value is undef compiles"?CODE?"
. A value of"call"
calls the sub and compiles its result."deparse"
serializes the code using B::Deparse, which reproduces the Perl source of the sub. Note that it then makes sense to enable"here_docs"
, because otherwise the deparsed code will be in one string with LFs quoted as"\012"
. - Reference-types GLOB, IO and FORMAT
-
Reference-types that cannot be compiled are GLOB (typeglob-refs), IO (file- and directory handles) and FORMAT. These are then converted into
"?GLOB?"
,"?IO?"
and"?FORMAT?"
. - Background: A Short Story of "Typeglobs"
-
Typeglobs are an idiosyncracy of Perl. Perl uses a symbol table per package (namespace) to map identifier names (like
"foo"
without sigil) to values. The symbol table is stored in the hash, named like the package with two colons appended. The main symbol table's name is thus %main::, or %::.For example, in the name
"foo"
in symbol tables is mapped to the typeglob value *foo. The typeglob object implements $foo (the scalar value), @foo (the list value), %foo (the hash value), &foo (the code value) and foo (the file handle or the format specifier). All types may coexist, so modifying $foo won't change %foo. But *baz = *foo overwrites, or creates, the symbol table entry"baz"
. (The value of"baz"
will be another typeglob object.)Typeglobs are variants that can store multiple concrete values. The sigil * serves as wildcard for the other sigils %, @, $ and &. (Note: a sigil is a symbol created for a specific magical purpose; the name derives from the latin sigilum = seal.) So, the fancy-free Perl primitives are \*foo, a typeglob-ref, and \*::, a typeglob-table-ref.
\*foo; # yields 'GLOB(0xNNN)' \*::; # yields 'GLOB(0xNNN)' die unless \*foo == *foo{GLOB}; # never fires
\*foo eventually is Perl's way to prove the existence of foo, the symbol. *foo is the internal "proxy" that tells perl what you really mean, at this moment, when you say
"foo"
. In core this proxy is a hash-table, hence another way to say \*foo is *foo{GLOB}, which eventually refers to"foo"
's incarnation as typeglob*foo
.In other words: with typeglobs you reach the bedrock of perl, where the spade bends back. Note, however, that after calling "compile()" typeglob-refs have gone up in smoke.
- undef
-
undef'd values in arrays are compiled into the default Rlist
""
.
- compile_fast(DATA)
-
Assemble Rlist from Perl data DATA as fast as actually possible with pure Perl. Reference-types SCALAR, HASH, ARRAY and REF are compiled into text, whether blessed or not. CODE, GLOB, IO and FORMAT are compiled as
"?CODE?"
,"?IO?"
,"?GLOB?"
and"?FORMAT?"
. undef values in arrays are compiled into the default Rlist""
.The main difference to "compile()" is that compile_fast() considers no compile options. Thus it cannot call code, implicitly round numbers etc., and cannot detect recursively-defined data.
compile_fast() returns a reference to the compiled string, which is a reference to a unique package variable. Subsequent calls to compile_fast() therefore reassign this variable.
AUXILIARY FUNCTIONS
The utility functions in this section are generally useful when handling stringified data. These functions are either very fast, or smart, or both. For example, "quote()", "unquote()", "escape()" and "unescape()" internally use precompiled regexes and precomputed ASCII tables; so employing these functions is probably faster then using own variants.
is_numeric(), is_name() and is_random_text()
- is_integer(SCALAR-REF)
-
Returns true when a scalar looks like an +/- integer constant. The function applies the compiled regex $Data::Rlist::g_re_integer.
- is_numeric(SCALAR-REF)
-
Test for strings that look like numbers. is_numeric() can be used to test whether a scalar looks like a integer/float constant (numeric literal). The function applies the compiled regex $Data::Rlist::g_re_float. Note that it doesn't match
- the IEEE 754 notations of Infinite and NaN,
- leading or trailing whitespace,
- lexical conventions such as the
"0b"
(binary),"0"
(octal),"0x"
(hex) prefix to denote a number-base other than decimal, and- Perls' "legible numbers", e.g. 3.14_15_92
See also
perldoc -q "whether a scalar is a number"
- is_name(SCALAR-REF)
-
Test for symbolic names. is_name() can be used to test whether a scalar looks like a symbolic name. Such strings need not to be quoted. Rlist defines symbolic names as a superset of C identifier names:
[a-zA-Z_0-9] # C/C++ character set for identifiers [a-zA-Z_0-9\-/\~:\.@] # Rlist character set for symbolic names [a-zA-Z_][a-zA-Z_0-9]* # match C/C++ identifier [a-zA-Z_\-/\~:@][a-zA-Z_0-9\-/\~:\.@]* # match Rlist symbolic name
For example, scoped/structured names such as std::foo, msg.warnings, --verbose, calculation-info need not be quoted. (But if they're quoted their value is exactly the same.) Note that is_name() does not catch leading or trailing whitespace. Another restriction is that
"."
cannot be used as first character, since it could also begin a number. - is_random_text(SCALAR-REF)
-
is_random_text() returns true if the scalar is neither a symbolic name nor a number, nor is double-quoted. When this function returns true, then "compile()" and "compile_fast()" would call "quote()" on the scalar. In Rlists, all scalars need to be quoted, expect those that are
- already quoted,
- look like C identifiers or symbolic names,
- look like C number constants.
Warning: is_random_text() makes no further test whether a string consists of characters that actually require escaping. That is, it returns also true on strings that do not adhere to 7-bit-ASCII, by defining characters <32 and >127.
See also "is_numeric()" and "is_name()".
quote(), escape() and unhere()
- quote(TEXT)
- escape(TEXT)
-
Converts TEXT into 7-bit-ASCII. All characters not in the set of the 95 printable ASCII characters are escaped. The difference between the two functions is that quote() additionally places TEXT into double-quotes.
The following ASCII codes will be converted to escaped octal numbers, i.e. 3 digits prefixed by a slash:
0x00 to 0x1F 0x80 to 0xFF " ' \
For example, quote(qq'"Früher Mittag\n"') returns
"\"Fr\374her Mittag\012\""
, while escape() returns\"Fr\374her Mittag\012\"
- maybe_quote(TEXT)
-
Return quote(TEXT) if "is_random_text(TEXT)"; otherwise (TEXT defines a symbolic name or number) return TEXT.
- unquote(TEXT)
- unescape(TEXT)
-
Reverses "quote()" and "escape()".
- unhere(HERE-DOC-STRING[, COLUMNS, FIRSTTAB, DEFAULTTAB])
-
HERE-DOC-STRING shall be a here-document. The function checks whether each line begins with a common prefix, and if so, strips that off. If no prefix it takes the amount of leading whitespace found the first line and removes that much off each subsequent line.
Unless COLUMNS is defined returns the new here-doc-string. Otherwise, takes the string and reformats it into a paragraph having no line more than COLUMNS characters long. FIRSTTAB will be the indent for the first line, DEFAULTTAB the indent for every subsequent line. Unless passed, FIRSTTAB and DEFAULTTAB default to the empty string
""
.This function combines recipes 1.11 and 1.12 from the Perl Cookbook.
split_quoted()
- split_quoted(INPUT[, DELIMITER])
- parse_quoted(INPUT[, DELIMITER])
-
Divide the string INPUT into a list of strings. DELIMITER is a regular expression specifying where to split (default:
'\s+'
). The function won't split at DELIMITERs inside quotes, or which are backslashed. For example, to split INPUT at commas use'\s*,\s*'
.parse_quoted() works like split_quoted() but additionally removes all quotes and backslashes from the splitted fields. Both functions effectively simplify the interface of Text::ParseWords. In an array context both return a list of substrings, otherwise the count of substrings. An empty array is returned in case of unbalanced
"
quotes, e.g. split_quoted(foo,"bar
) returns ().EXAMPLES
split_quoted():
sub split_and_list($) { print ($i++, " '$_'\n") foreach split_quoted(shift) } split_and_list(q("fee foo" bar)) 0 '"fee foo"' 1 'bar' split_and_list(q("fee foo"\ bar)) 0 '"fee foo"\ bar'
The default DELIMITER
'\s+'
handles newlines. split_quoted("foo\nbar\n"
) returns ('foo','bar','') and hence can be used to to split a large string of uncho(m)p'd input lines into words:split_and_list("foo \r\n bar\n") 0 'foo' 1 'bar' 2 ''
The DELIMITER matches everywhere outside of quoted constructs, so in case of the default
'\s+'
you may want to remove heading/trailing whitespace. Considersplit_and_list("\nfoo") split_and_list("\tfoo") 0 '' 1 'foo'
and
split_and_list(" foo ") 0 '' 1 'foo' 2 ''
parse_quoted():
sub parse_and_list($) { print ($i++, " '$_'\n") foreach parse_quoted(shift) } parse_and_list(q("fee foo" bar)) 0 'fee foo' 1 'bar' parse_and_list(q("fee foo"\ bar)) 0 'fee foo bar'
MORE EXAMPLES
String
'field\ one "field\ two"'
:('field\ one', '"field\ two"') # split_quoted ('field one', 'field two') # parse_quoted
String
'field\,one, field", two"'
with a DELIMITER of'\s*,\s*'
:('field\,one', 'field", two"') # split_quoted ('field,one', 'field, two') # parse_quoted
Split a large string $soup (mnemonic: possibly "slurped" from a file) into lines, at LF or CR+LF:
@lines = split_quoted($soup, '\r*\n');
Then transform all @lines by correctly splitting each line into "naked" values:
@table = map { [ parse_quoted($_, '\s*,\s') ] } @lines
Here is some more complete code to parse a .csv-file with quoted fields, escaped commas:
open my $fh, "foo.csv" or die $!; local $/; # enable localized slurp mode my $content = <$fh>; # slurp whole file at once close $fh; my @lines = split_quoted($content, '\r*\n'); die q(unbalanced " in input) unless @lines; my @table = map { [ map { parse_quoted($_, '\s*,\s') } ] } @lines
Note, however, that the "read_csv()" function already reads .csv-file perfectly well.
A nice way to make sure what split_quoted() and parse_quoted() return is using deep_compare(). For example, the following code shall never die:
croak if deep_compare([split_quoted("fee fie foo")], ['fee', 'fie', 'foo']); croak if deep_compare( parse_quoted('"fee fie foo"'), 1);
The 2nd call to "parse_quoted()" happens in scalar context, hence shall return 1 because there's one string to parse.
equal() and round()
- equal(NUM1, NUM2[, PRECISION])
- round(NUM1[, PRECISION])
-
Compare and round floating-point numbers. "equal()" returns true if NUM1 and NUM2 are equal to PRECISION (default: 6) number of decimal places. NUM1 and NUM2 are string- or number scalars.
Normally round() will return a number in fixed-point notation. When the package-global $Data::Rlist::RoundScientific is true round() formats the number in either normal or exponential (scientific) notation, whichever is more appropriate for its magnitude. This differs slightly from fixed-point notation in that insignificant zeroes to the right of the decimal point are not included. Also, the decimal point is not included on whole numbers. For example, "round(42)" does not return 42.000000, and round(0.12) returns 0.12, not 0.120000. This behavior is especially welcome when scientific notation was selected. For example, note that
sprintf("%.6g\n", 2006073104)
yields 2.00607e+09, which looses digits.
MACHINE ACCURACY
One needs "equal()" to compare floats because IEEE 754 single- and double precision implementations are not absolute - in contrast to the numbers they represent. In all machines non-integer numbers are only an approximation to the numeric truth. In other words, they're not commutative! For example, given two floats a and b, the result of a+b might be different than that of b+a.
Each machine has its own accuracy, called the machine epsilon, which is the difference between 1 and the smallest exactly representable number greater than one. Most of the time only floats can be compared that have been carried out to a certain number of decimal places. In general this is the case when two floats that result from a numeric operation are compared - but not two constants. (Constants are accurate through to lexical conventions of the language. The Perl and C syntaxes for numbers simply won't allow you to write down inaccurate numbers in code.)
See also recipes 2.2 and 2.3 in the Perl Cookbook.
EXAMPLES
CALL RETURNS NUMBER ---- -------------- round('0.9957', 3) 0.996 round(42, 2) 42 round(0.12) 0.120000 round(0.99, 2) 0.99 round(0.991, 2) 0.99 round(0.99, 1) 1.0 round(1.096, 2) 1.10 round(+.99950678) 0.999510 round(-.00057260) -0.000573 round(-1.6804e-6) -0.000002
deep_compare()
- deep_compare(A, B[, PRECISION, PRINT])
-
Compare and analyze two numbers, strings or references. Generates a log (stack of messages) describing exactly all unequal data. Hence, for some perl data $a and $b one can assert:
croak "$a differs from $b" if deep_compare($a, $b);
When PRINT is true traces progress on stdout.
RESULT
Returns an array of messages, each describing unequal data, or data that cannot be compared because of type- or value-mismatching. The array is empty when deep comparison of A and B found no unequal numbers or strings, and only indifferent types.
EXAMPLES
The result is line-oriented, and for each mismatch it returns a single message:
Data::Rlist::deep_compare(undef, 1)
yields
<<undef>> cmp <<1>> stop! 1st undefined, 2nd defined (1)
Some more complex example. Deep-comparing two multi-level data structures A and B returned two messages:
'String literal' == REF(0x7f224) stop! type-mismatch (scalar versus REF) 'Greetings, earthlings!' == CODE(0x7f2fc) stop! type-mismatch (scalar versus CODE)
Somewhere in A a string
"String literal"
could not be compared, because the corresponding element in B is a reference to a reference. Next it says that"Greetings, earthlings!"
could not be compared because the corresponding element in B is a code reference. (One could assert, however, that the actual opacity here is that they speak ASCII.)Actually, A and B are identical. B was written to disk (by "write()")and then read back as A (by "read()"). So, why don't they compare anymore? Because in B the refs REF(0x7f224) and CODE(0x7f2fc) hide
\"String literal"
and
sub { 'Greetings, earthlings!' }
When writing B to disk write() has dissolved the scalar- and the code-reference into
"String literal"
and"Greetings, earthlings!"
. Of course, deep_compare() will not do that, so A does not compare to B anymore. Note that despite these two mismatches, deep_compare() had continued the comparision for all other elements in A and B. Hence the structures are identical in all other elements.
fork_and_wait(), synthesize_pathname()
- fork_and_wait(PROGRAM[, ARGS...])
-
Forks a process and waits for completion. The function will extract the exit-code, test whether the process died and prints status messages on stderr. fork_and_wait() hence is a handy wrapper around the built-in system() and exec() functions. Returns an array of three values:
($exit_code, $failed, $coredump)
$exit_code is -1 when the program failed to execute (e.g. it wasn't found or the current user has insufficient rights). Otherwise $exit_code is between 0 and 255. When the program died on receipt of a signal (like SIGINT or SIGQUIT) then $signal stores it. When $coredump is true the program died and a core file was written. Note that some systems store cores somewhere else than in the programs' working directory.
- synthesize_pathname(TEXT...)
-
Concatenates and forms all TEXT strings into a symbolic name that can be used as a pathname. synthesize_pathname() is a useful function to reuse a string, assembled from multiple strings, coinstantaneously as hash key, database name, and file- or URL name. Note, however, that few characters are mapped to only
"_"
and"-"
.
IMPORTED FUNCTIONS
Explicit Imports
Three tags are available that import function sets. These are utility functions usable also separately from Data::Rlist.
- :floats
-
Imports "equal()", "round()" and "is_numeric()".
- :strings
-
Imports "maybe_quote()", "quote()", "escape()", "unquote()", "unescape()", "unhere()", "is_random_text()", "is_numeric()", "is_name()", "split_quoted()", and "parse_quoted()".
- :options
-
Imports "predefined_options()" and "complete_options()".
- :aux
-
Imports "deep_compare()", "fork_and_wait()" and "synthesize_pathname()".
EXAMPLES
use Data::Rlist qw/:floats :strings/;
Automatic Imports
These functions are implicitly imported into the callers symbol table by the package: ReadCSV(), ReadData(), WriteData(), PrintData(), OutlineData(), StringizeData(), SqueezeData(), KeelhaulData() and CompareData().
You may say require Data::Rlist (instead of use Data::Rlist) to prohibit auto-import. See also perlmod.
Importing when Rlist.pm is installed locally
Installing CPAN packages usually requires administrator privileges. In case you don't have them, another way is to the Rlist.pm file e.g. into . or ~/bin:
BEGIN {
$0 =~ /[^\/]+$/;
push @INC, $`||'.', "$ENV{HOME}/bin";
require Rlist;
Data::Rlist->import();
Data::Rlist->import(qw/:floats :strings/);
}
This code finds Rlist.pm also in . and ~/bin. It then calls the Exporter manually.
ReadCSV() and ReadData()
- ReadCSV(INPUT[, DELIMITER, FILTER, FILTER-ARGS])
-
Calls "read_csv()".
- ReadData(INPUT[, FILTER, FILTER-ARGS])
-
Calls "read()".
WriteCSV() and WriteData()
- WriteCSV(DATA[, OUTPUT, OPTIONS, COLUMNS])
-
Calls "write_csv()".
- WriteData(DATA[, OUTPUT, OPTIONS, HEADER])
-
Calls "write()".
OutlineData(), StringizeData() and SqueezeData()
- OutlineData(DATA[, OPTIONS])
- StringizeData(DATA[, OPTIONS])
- SqueezeData(DATA[, OPTIONS])
-
Calls "make_string()".
OutlineData() applies the predefined
"outlined"
options set, while StringizeData() applies"string"
and SqueezeData()"squeezed"
. When specified, OPTIONS are merged into the predefined set. For example,print "\n\$thing: ", OutlineData($thing, { precision => 12 });
rounds all numbers in $thing to 12 digits.
KeelhaulData() and CompareData()
- KeelhaulData(DATA[, OPTIONS])
-
Calls "keelhaul()". For example,
use Data::Rlist; . . my($copy, $as_text) = KeelhaulData($thing);
- CompareData(A, B[, PRECISION, PRINT_TO_STDOUT])
-
Calls "deep_compare()".
HISTORY / NOTES
The Random Lists (Rlist) syntax is inspired by NeXTSTEP's Property Lists. Rlist is simpler, more readable and more portable. The Perl, C and C++ implementations are fast, table and free. Markus Felten, with whom I worked a few month in a project at Deutsche Bank, Frankfurt in summer 1998, arrested my attention on Property lists. He had implemented a Perl variant of it ("http://search.cpan.org/search?dist=Data-PropertyList").
The term "Random" underlines the fact that the language
has only four primitive data types;
the basic building block is a list (sequential or associative), and this list can be combined at random with other lists.
Hence the term "Random" does not mean aimless or accidental. Random Lists are arbitrary lists. Application data can be made portable (due to 7-bit-ASCII) and persistent by dealing arbitrarily with lists of numbers and strings. Like with CSV the lexical overhead Rlist imposes is minimal: files are merely data. Also, files are viewable/editable by text editors. Users then shall not be dazzled by language gizmo's.
SEE ALSO
Data::Dumper
In contrast to the Data::Dumper, Data::Rlist scalars will be properly typed as number or string. Data::Dumper writes numbers always as quoted strings, for example
$VAR1 = {
'configuration' => {
'verbose' => 'Y',
'importance_sampling_loss_quantile' => '0.04',
'distribution_loss_unit' => '100',
'default_only' => 'Y',
'num_threads' => '5',
.
.
}
};
where Data::Rlist writes
{
configuration = {
verbose = Y;
importance_sampling_loss_quantile = 0.04;
distribution_loss_unit = 100;
default_only = Y;
num_threads = 5;
.
.
}
}
As one can see Data::Dumper writes the data right in Perl syntax, which means the dumped text can be simply eval'd. Rlists are not Perl-syntax and need to be parsed carefully. But Rlist text is portable (7-bit-ASCII with non-printables escaped) and implementations exist for other programming languages, namely C++ which uses a fast flex/bison-parser.
While reading Data::Dumper-generated files back is generally faster than "read()". For example, with $Data::Dumper::Useqq enabled, it was observed that Data::Dumper renders output three to four times slower than "compile()"
Consider also that Data::Rlist tests for any scalar whether it is numeric or not (see "is_random_text()"), where Data::Dumper simply quotes any number and string. So Data::Rlist is able to implicitly round floats to a certain precision, making them finally comparable (see "round()" for more information).
Data::Rlist generates much smaller files: with the default $Data::Dumper::Indent of 2 Rlist output is just 15-20% of the size the Data::Dumper package prints (for the same data). The simple reason: Data::Dumper recklessly uses many whitespaces (blanks) instead of horizontal tabulators; this unnecessarily blows up file sizes.
DEPENDENCIES
Data::Rlist depends only on few other packages:
Exporter
Carp
strict
integer
Sys::Hostname
Scalar::Util # deep_compare() only
Text::Wrap # unhere() only
Text::ParseWords # split_quoted(), parse_quoted() only
Data::Rlist is free of $&, $` or $'. Reason: once Perl sees that you need one of these meta-variables anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program (see also perlre).
BUGS AND DEFICIENCIES
There are no known bugs, this package is stable.
Deficiencies of this version:
nanoscripts not yet implemented.
The
"deparse"
functionality for the"code_refs"
compile option has not yet been implemented.The
"threads"
compile option has not yet been implemented.IEEE 754 notations of Infinite and NaN not yet implemented.
To increase compilation speed, a string $s is only "quote()"d when $s!~$Data::Rlist::g_re_value. (Note that this regex is applied also by "is_random_text()".) The regex checks wether $s begins with
"
, or defines a symbolic name or a number. But when the 1st character of $s is"
, no further test are made whether characters in the actually require escaping. It is then believed that the string adheres to 7-bit-ASCII. If this isn't the case it might not be read back correctly. See also "is_name()", "is_integer()" and "is_numeric()".
AUTHOR
Andreas Spindler, rlist@visualco.de
COPYRIGHT AND LICENSE
Copyright 1998-2007 Andreas Spindler
Maintained at CPAN and "http://www.visualco.de"
See http://search.cpan.org/~aspindler.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.
Thank you for your attention.