NAME

Data::Rlist - A lightweight data language for Perl, C and C++

SYNOPSIS

use Data::Rlist;
    .
    .

Data from text:

$string_ref = Data::Rlist::write_string($data);

$string     = Data::Rlist::make_string($data);

$data       = Data::Rlist::read_string($string_ref);

$data       = Data::Rlist::read_string($string);

Data from files:

        Data::Rlist::write($data, $filename);

$data = Data::Rlist::read($filename);

Perform safe deep copies of data:

$deep_copy  = Data::Rlist::keelhaul($data);

The same can be achieved with the object-oriented interface:

$object = new Data::Rlist(-data => $thing, -output => \$target_string)

-data defines the data to be compiled, and -output where to write the compilation. -output either defines a string reference or the name of a file:

$string_ref = $object->write;       # compile $thing, return \$target_string

$object->set(-output => "$HOME/.foorc"); # refine as filename

$object->write;                     # write "~/.foorc"

Passing an argument to write() eventually overrides -output:

$object->write(".barrc");           # write to some other file

write_string() and make_string() make up a string out of thin air, no matter how -output is set:

$string_ref = $object->write_string; # write to new string (ignores -output)

$string     = $object->make_string; # dto. but return string value

print $object->make_string;         # ...dump $thing to stdout

However, all these functions apply -data as the Perl data to be compiled. The attribute -input defines what to parse: read() compiles the text defined by -input back to Perl data:

$object->set(-input => \$rlist_language_productions);

$data = $object->read;

$data = $object->read($other); # overrides -input attribute

Analog to -data the -input attribute shall be either a string-reference, undef or the name of a file:

use Env qw/HOME/;

$object->set(-input => "$HOME/.foorc");

$data = $object->read;		# open and parse "~/.foorc"

$data = $object->read(".barrc"); # parse some other file (override -input)

$data = $object->read(\$string); # parse some string (override -input)

$data = $object->read_string($string_or_ref); # dto.

KEELHAULING DATA

Data::Rlist can also create deep-copies of Perl data, a functionality called keelhauling:

$deep_copy = $object->keelhaul;	# create in-depth copy of $thing

The metaphor vividly connotes that $thing is stringified, then compiled back. See keelhaul() for why this only sounds useless. The little brother of keelhaul() is deep_compare():

print join("\n", Data::Rlist::deep_compare($a, $b));

DESCRIPTION

Venue

Random-Lists (Rlist) is a tag/value format for text data. It converts objects into legible, plain text. Rlist is a data format language that uses lists of (a) values and (b) tags and values to structure data. Shortly, to stringify objects. The design targets the simplest (yet complete) language for constant data:

- it allows the definition of hierachical data,

- it disallows recursively-defined data,

- it does not consider user-defined types,

- it has no keywords,

- it has no arithmetic expressions,

- it uses 7-bit-ASCII character encoding.

Rlists are not Perl syntax, and can be used also from C and C++ programs.

RLIST    PERL
-----    ----
 5;       { 5 => undef }
 "5";     { "5" => undef }
 5=1;     { 5 => 1 }
 {5=1;}   { 5 => 1 }
 (5)      [ 5 ]
 {}       { }
 ;        { }
 ()       [ ]
Strings and Numbers
"Hello, World!"

Symbolic names are simply strings consisting only of [a-zA-Z_0-9-/~:.@] characters. For such strings the quotes are optional:

foobar   cogito.ergo.sum   Memento::mori

Numbers adhere to the IEEE 754 syntax for integer- and floating-point numbers:

38   10e-6   -.7   3.141592653589793
Array

Arrays are sequential lists:

( 1, 2, ( 3, "Audiatur et altera pars!" ) )
Hash

Hashes map a key scalar to some value, a subsquent Rlist. Hashes are associative lists:

    {
        key = value;
        3.14159 = Pi;
        "Meta-syntactic names" = (foo, bar, baz, "lorem ipsum", Acme, ___);
		lonely-key;
    }

Audience

Rlist is useful as a "glue data language" between different systems and programs, for configuration files and for persistence layers (object storage). It attempts to represent the data pure and untinged, but without breaking its structure or legibility. The format excels over comma-separated values (CSV), but isn't as excessive as XML:

  • Like CSV the format describes merely the data itself, but the data may be structured in multiple levels, not just lines.

  • Like XML data can be as complex as required, but while XML is geared to markup data within some continuous text (the document), Rlist defines the pure data structure. However, for non-programmers the syntax is still self-evident.

Rlists are built from only four primitives: number, string, array and hash. The penalty with Rlist hence is that data schemes are tacit consents between the users of the data (the programs).

Implementations yet exist for Perl, C and C++, Windows and UN*X. These implementations are stable, portable and very fast, and they do not depend on other software. The Perl implementation operates directly on primitive types, where C++ uses STL types. Either way data integrity is guaranteed: floats won't loose their precision, Perl strings are loaded into std::strings, and Perl hashes and arrays resurrect in as std::maps and std::vectors.

Moreover, a design goal of Rlist was to scale perfectly well: a single text files can express hundreds of megabytes of data, while the data is readable in constant time and with constant memory requirements. This makes Rlist files applicable as "mini-databases" loaded into RAM at program startup. For example, http://www.sternenfall.de uses Rlist instead of a MySQL databases.

Number and String

All program data is finally convertible into numbers and strings. Therefore also any data file consists of numbers (integers and floats) and strings, aggregate into more complex structures. In Rlist number and string constants follow the C language lexicography. Strings that look like C identifier names must not be quoted.

By definition all input is compiled into an array or hash; hashes are the default. For example, the string "Hello, World!" is compiled into:

{ "Hello, World!" => undef }

Likewise the parser of the C++ implementation by default returns a std::map with one pair. Another default is "", the empty string, which is the default scalar value. In Perl, undef'd list elements are compiled into "".

Strings are quoted implicitly when building Rlists; when reading them back strings are unquoted. Quoting means to encode characters, then wrap the string into ". You can can also make use of this functionality by calling quote() and unquote() as separate functions.

Here Documents

Rlist is capable of a line-oriented form of quoting based on the UNIX shell here-document syntax and RFC 111. Multi-line quoted strings can be expressed with

<<DELIMITER

Following the sigil << an identifier specifies how to terminate the string scalar. The value of the scalar will be all lines following the current line down to the line starting with the delimiter. There must be no space between the << and the identifier. For example,

{
    var = {
        log = {
            messages = <<LOG;
Nov 27 21:55:04 localhost kernel: TSC appears to be running slowly. Marking it as unstable
Nov 27 22:34:27 localhost kernel: Uniform CD-ROM driver Revision: 3.20
Nov 27 22:34:27 localhost kernel: Loading iSCSI transport class v2.0-724.<6>PNP: No PS/2 controller found. Probing ports directly.
Nov 27 22:34:27 localhost kernel: wifi0: Atheros 5212: mem=0x26000000, irq=11
LOG
        };
    };
}

Character Encoding

Rlist text uses 7-bit-ASCII. The 95 printable character codes 32 to 126 occupy one character. Codes 0 to 31 and 127 to 255 require four characters each: the \ escape character followed by the octal code number. For example, the German Umlaut character ü (252) is translated into \374. An exception are codes 93 (backslash), 34 (double-quote) and 39 (single-quote), which are escaped as

\\   \"   \'

Binary Data

Binary data can be represented as base64-encoded string or here-document.

Embedded Perl Code

Rlists may define embedded programs: nanonscripts. They're defined as here-document that is delimited with the special string "nanoscript". For example,

hello = (<<nanoscript);
print "Hello, World!";
nanoscript

After the Rlist has been fully parsed such strings are eval'd in the order of their occurrence. Within the eval %root or @root defines the root of the current Rlist.

Comments

Rlist supports multiple forms of comments: // or # single-line-comments, and /* */ multi-line-comments.

EXAMPLES

Basic Rlist values are number and string constants, from which larger structures are built. All of the following paragraphs define valid Rlists.

Single strings and numbers:

"Hello, World!"

foo                     // compiles to { 'foo' => undef }

3.1415                  // compiles to { 3.1415 => undef }

Array:

(1, a, 4, "b u z")      // list of numbers/strings

((1, 2),
 (3, 4))                // list of list (4x4 matrix)

((1, a, 3, "foo bar"),
 (7, c, 0, ""))         // another list of lists

Array of strings:

warning = (
    "main correlation-matrix not positive-definite", 
    "using pseudo-decomposed sigma-matrix", 
    "cannot evaluate CVaR: the no. of simulations is to low for confidence-level 0.90"
);

Configuration object as hash:

{
    contribution_quantile = 0.99;
    default_only_mode = Y;
    importance_sampling = N;
    num_runs = 10000;
    num_threads = 10;
    # etc.
}

A comprehensive example:

"Metaphysic-terms" =
{
    Numbers =
    {
        3.141592653589793 = "The ratio of a circle's circumference to its diameter.";
        2.718281828459045 = <<___;
The mathematical constant "e" is the unique real number such that the value of
the derivative (slope of the tangent line) of f(x) = e^x at the point x = 0 is
exactly 1.
___
        42 = "The Answer to Life, the Universe, and Everything.";
    };

    Words =
    {
        ACME = <<Value;
A Company [that] Makes Everything: Wile E. Coyote's supplier of equipment and gadgets.
Value
        <<Key = <<Value;
foo bar foobar
Key
[JARGON] A widely used meta-syntactic variable; see foo for etymology.  Probably
originally propagated through DECsystem manuals [...] in 1960s and early 1970s;
confirmed sightings go back to 1972. [...]
Value
    };
};

PACKAGE DETAILS

Compile Options

The format of the compiled text and the behavior of compile() can be controlled by the OPTIONS parameter of write(), write_string() etc. The argument is a hash defining how the Rlist text shall be formatted. The following pairs are recognized:

'precision' => NUMBER

Unless NUMBER undef round all numbers to the decimal places NUMBER by calling round(). By default NUMBER is undef, so compile() does not round floats.

'scientific' => FLAG

Causes compile() to masquerade $Data::Rlist::RoundScientific; see round() for the implications. Alternately the -RoundScientific object attribute can be set; see new().

'code_refs' => FLAG

If enabled and write() encounters a CODE reference, calls the code, then compiles the return value. Disabled by default.

'threads' => COUNT

If enabled compile() internally use multiple threads. Note that this makes only sense on machines with at least COUNT CPUs.

'here_docs' => FLAG

If enabled strings with at least two newlines in them are written in the here-doc-format. Note that the string has to be terminated with a "\n" to qualify as here-document.

'outline_data' => NUMBER

Use "eol" (linefeed) to "distribute data on many lines." Insert a linefeed after every NUMBERth array value; 0 disables outlining.

'outline_hashes' => FLAG

If enabled, and "outline_data" also is also enabled, prints { and } on distinct lines when compiling Perl hashes with at least one pair.

'comma' => STRING

The comma-separator string to be used by write_csv(). The default is ','.

'delimiter' => STRING-OR-REGEX

Field-delimiter for read_csv(). The default is '\s*,\s*'.

The following options format the generated Rlist; normally you don't want to modify them:

'bol_tabs' => COUNT

Count of physical, horizontal TAB characters to use at the begin-of-line per indentation level. Defaults to 1. Note that we don't use blanks, because they blow up the size of generated text without measure.

'eol_space' => STRING

End-of-line string to use (the linefeed). For example, legal values are "", " ", "\r\n" etc. The default is "\n".

'paren_space' => STRING

String to write after ( and {, and before } and ) when compiling arrays and hashes.

'comma_punct' => STRING
'semicolon_punct' => STRING

Comma and semicolon strings, which shall be at least "," and ";". No matter what, compile() will always print the "eol" string after the "semicolon" string.

'assign_punct' => STRING

String to combine key/value-pairs. Defaults to " = ". Shall be at least "=" to not violate the compiled Rlist.

The OPTIONS parameter accepted by some package functions is either a hash-ref or the name of a predefined set:

OPTIONS VALUE   PREDEFINED FORMAT
-------------   -----------------
'default'       Default if writing to a file - the F<write()> function.
'string'        Compact, no newlines/here-docs. Renders a "string of data".
'outlined'      Optimize the compiled Rlist for maximum readability.
'squeezed'      Very compact, no whitespace at all. For very large Rlists.
'perl'          Compile data in Perl syntax, using compile_Perl(), not compile().
'fast'          Compile data as fast as possible, using compile_fast(), not compile().
undef           dto.

All functions that define an OPTIONS parameter implicitly call complete_options() to complete it from one of the predefined set. Therefore you may just define a "lazy subset of options" to these functions. For example,

Data::Rlist->new->write($thing, { scientific => 1, precision => 8 });

Debugging Data (Finding Self-References)

Debugging (hierachical) data means breaking recursively-defined data.

Set $Data::Rlist::MaxDepth to an integer above 0 to define the depth under which compile() shall not venture deeper. 0 disables debugging. When positive compilation breaks on deep recursions caused by circular references, and on stderr a message like the following is printed:

ERROR: compile2() broken in deep ARRAY(0x101aaeec) (depth = 101, max-depth = 100)

The message will also be repeated as comment when the compiled Rlist is written to a file. Furthermore $Data::Rlist::Broken is incremented by one - and compilation continues! So, any attempt to venture deeper as suggested by $Data::Rlist::MaxDepth in the data will be blocked, but compilation continues above that depth. After write() or write_string() returned, the caller can check whether $Data::Rlist::Broken is not zero. Then not all of the data was compiled into text.

Quoting strings that look like numbers

Normally you don't have to care about strings, since un/quoting happens as required when reading/compiling Rlists from Perl data. A common problem, however, occurs when some text fragment (string) uses the same lexicography than numbers do.

Printed text uses well-defined glyphs and typographic conventions, and finally the competence of the reader to recognize numbers. But computers need to know the exact number type and format to recognize numbers. Integer? Float? Hexadecimal? Scientific? Klingon?

The Perl Cookbook in recipe 2.1 recommends the use of a regular expression to distinguish number from string scalars. The advice illustrates how hard the problem actually is. Not only Perl has to come over this; any program that interprets text has to.

Since Perl scripts are texts that process text into more text, Perl's artful answer was to define typeless scalars. Scalars hold a number, a string or a reference. Therewith Perl solves the problem that digits, like alphabetics and punctuations, are regular ASCII codes. So Perl defines the string as the basic building block for all program data. Venturesome it then lets the program decide what strings mean. Analogical, in a printed book the reader has to decipher the glyphs and decide what evidence they hide.

In Rlist, string scalars that look like numbers need to be quoted explicitly. Otherwise, for example, the scalar $s="-3.14" appears as -3.14 in the output. Likewise "007324" is compiled into 7324: the text quality is lost and the scalar is read back as a number. Of course, this behavior is by intend, and in most cases this is just what you want. For hash keys, however, it might be a problem. One solution is to prefix the string by an artificial "_":

my $s = '-9'; $s = "_$s";

Since the scalar begins with a "_" it does not qualify as a number anymore, and hence is compiled as string, and read back as string. In the C++ implementation it will then become std::string, not a double. The leading "_" then must be removed by the reading program, which debunks this technique as a rather poor hack. Perhaps a better solution is to explicitly call Data::Rlist::quote:

$k = Data::Rlist::quote($k);  # returns qq'"-9"'

Again, the need to quote strings that look like numbers is a problem evident only in the Perl implementation of Rlist, since Perl is a language with weak types. As a language with very strong typing, C++ is quasi the antipode to Perl. With the C++ implementation of Rlist then there's no need to quote strings that look like numbers.

See also write(), is_numeric(), is_name(), is_random_text() and http://en.wikipedia.org/wiki/American_Standard_Code_for_Information_Interchange.

Speed-up Compilation

Much work has been spent to optimize Data::Rlist for speed. Still it is implemented in pure Perl (no XS). A very rough estimate for Perl 5.8 is "each MB takes one second per GHz". For example, when the resulting Rlist file has a size of 13 MB, compiling it from a Perl script on a 3-GHz-PC requires about 5-7 seconds. Compiling the same data under Solaris, on a sparcv9 processor operating at 750 MHz, takes about 18-22 seconds.

Explicit Quoting

The process of compiling can be speed up by calling quote() explicitly on scalars. That is, before calling write() or write_string(). Large data sets may compile faster when for scalars, that certainly not qualify as symbolic name, quote() is called in advance:

use Data::Rlist qw/:strings/;

$data{quote($key)} = $value;
    .
    .
Data::Rlist::write("data.rlist", \%data);

instead of

$data{$key} = $value;
    .
    .
Data::Rlist::write("data.rlist", \%data);

It depends on the case whether the first variant is faster: compile() and compile_fast() both have to call is_random_text() on each scalar. When the scalar is already quoted, i.e. its first character is ", this test ought to run faster.

Note that internally is_random_text() applies the precompiled regex $g_re_value. But for a given scalar $s the expression

($s !~ $Data::Rlist::g_re_value)

can be up to 20% faster than the equivalent is_random_text($s).

PACKAGE FUNCTIONS

Construct Objects

new(), get() and set()

These are the core functions to cultivate package objects.

The following functions may be called also as methods: read(), read_csv(), read_string(), write(), write_string() and keelhaul().

new(ATTRIBUTES)

ATTRIBUTES is a hash-table defining object attributes. Example:

$self = Data::Rlist->new(-input => "foo.rlist", -data => $thing);

REGULAR ATTRIBUTES

-input => INPUT

Defines what to parse. INPUT defines a filename or string reference. Applied by read(), read_csv() and read_string().

-data => DATA

Defines the data to be compiled. DATA is some Perl data. Applied by write(), write_string() and keelhaul().

-output => OUTPUT (optional)

Defines where to put the compilation: either a filename, string-reference or undef.

-filter => FILTER (optional)
-filter-args => FILTER-ARGS (optional)

Used by read() as the preprocessor on the input file. Then applied before parsing. FILTER can be 1 to select the standard C preprocessor cpp. Applied by open_input().

-delimiter => DELIMITER (optional)

See read_csv().

-options => OPTIONS (optional)

Defines the compile options.

-header => STRINGS (optional)
-columns => STRINGS (optional)

Defines the header text (the comments) for data written to files, and the column names of CSV files. Used in place of the HEADER parameter of write() and COLUMNS of write_csv().

ATTRIBUTES THAT MASQUERADE PACKAGE GLOBALS

These attributes raise new values for package globals while object methods are executed. The new values are provided by an object that therewith locks the package (in which case $Data::Rlist::Locked is true.) When the method returns the previous globals are restored.

-SafeCppMode => FLAG (optional)

Used by read() to masquerade $Data::Rlist::SafeCppMode.

-MaxDepth => INTEGER (optional)

Used by write() to masquerade $Data::Rlist::MaxDepth.

-RoundScientific => FLAG (optional)

Used by round() during compilation. Masquerades $Data::Rlist::RoundScientific. Note that round() is only called when the "precision" option is defined.

set(SELF[, ATTRIBUTES])

Reset or initialize object attributes (see new()). Returns SELF. Example:

$obj->set(-input => \$str, -output => 'temp.rls', -options => 'squeezed');
get(SELF, NAME[, DEFAULT])

Get some object attribute. For NAME the leading hyphen is optional. Unless NAME exists as an attribute returns DEFAULT, or undef.

EXAMPLES

$self->get('foo');			# returns $self->{-foo} or undef
$self->get(-foo=>);			# dto.
$self->get('foo', 42);		# returns $self->{-foo} or, unless exists, 42

Interface

Public functions to be called by users of the package.

read(), read_csv() and read_string()

read(INPUT[, FILTER, FILTER-ARGS])

Parse data structure from INPUT.

PARAMETERS

INPUT shall be either

- some Rlist object created by new(),

- a string reference, in which case read() and read_string() parse Rlist text from it,

- a string scalar, in which case read() assumes a file to open and to parse.

See open_input() for details on the FILTER and FILTER-ARGS parameters, which are used to preprocess input files before actually reading them. When specified, and INPUT is an object, they overload the -filter and -filter-args attributes.

When the input file cannot be open'd and flock'd this function dies. Note that die is Perl's mechanism to raise exceptions; they can be catched with eval. For example,

my $host = eval { use Sys::Hostname; hostname; } || 'some unknown machine';

This code fragment traps the die exception; when it was raised eval returns undef, otherwise the result of calling hostname. For read this means

$data = eval { Data::Rlist::read($tempfile) };
print STDERR "$tempfile not found, is locked or is empty" unless defined $data;

RESULT

read() returns parsed data (reference) or undef if there was no data (when the length of the physical file is greater than zero it had only comments/whitespace).

See also parse(), write(), write_string().

read_csv(INPUT[, OPTIONS, FILTER, FILTER-ARGS])

PARAMETERS

See read() for INPUT and open_input() for FILTER and FILTER-ARGS (open_input() is called internally). For example, simply pass 1 as FILTER argument to read INPUT through the standard C preprocessor, instead of reading it directly.

The comma-delimiter is read from OPTIONS, as "delimiter". It defaults to '\s*,\s*'.

RESULT

Returns a list of lists. In list context a list of array references, in scalar context a reference to such a list. Each embedded array defines the fields in a line, and may be of variable length.

read_string(INPUT)

Calls read() to read Rlist language productions from the string or string-reference INPUT.

write(), write_csv() and write_string()

write(DATA[, OUTPUT, OPTIONS, HEADER])

Translates Perl data into some Rlist, i.e. into printable text. DATA is either an object generated by new(), or some Perl data, or undef. write() is auto-exported as WriteData().

PARAMETERS

When DATA is an object the Perl data to be compiled is defined by the -data attribute. (When -data refers to another Rlist object, this other object is invoked.) Otherwise DATA defines the data to be compiled.

Optional OUTPUT defines where to compile to. Defaults to the -output attribute when DATA defines some Data::Rlist object. Defines a filename to create, or some string-reference. When undef writes to some anonymous string.

Optional OPTIONS arguments defines how to compile the Rlist text. Defaults to the -options attribute when DATA is an object. When uses compile_fast(), otherwise compile().

Optional HEADER is a reference to an array of strings that shall be printed literally at the top of an output file. Defaults to the -header attribute when DATA is an object.

RESULT

When write() creates a file it returns 0 for failure or 1 for success. Otherwise it returns a string reference.

EXAMPLES

$self = new Data::Rlist(-data => $thing, -output => $output);

$self->write;   # Write into some file (if $output is a filename) or string (if $output is a
                # string reference).

Data::Rlist::write($thing, $output);    # dto. applying the functional interface

new Data::Rlist(-data => $self)->write; # Another way to do it.

print $self->make_string;               # Print $thing to stdout.
print Data::Rlist::make_string($thing); # dto. applying the functional interface
write_csv(DATA[, OUTPUT, OPTIONS, COLUMNS])

Write DATA as CSV to file or string OUTPUT.

This function automatically quotes all fields that do not look like numbers (see is_numeric()). Numbers are rounded to the specified precision.

write_csv() is auto-exported as WriteCSV().

PARAMETERS

See write() for the DATA and OUTPUT parameters, which are semantically equal. From OPTIONS is read the comma-separator ("comma", default is ","), the linefeed ("eol_space", default is "\n") and the numeric precision ("precision").

COLUMNS, if specified, shall be an array-ref defining the column names to be written as the first line.

Like with write(), unless DATA refers to some Data::Rlist object, it shall define the data to be compiled. But because of the limitations of CSV files the data may not be just any Perl data. It must be a reference to an array of array references, where each contained array defines the fields, e.g.

[ [ a, b, c ],      # line 1
  [ d, e, f, g ],   # line 2
    .
    .
]

RESULT

When write_csv() creates a file it returns 0 for failure or 1 for success. Otherwise it returns a string reference.

EXAMPLES

Functional interface:

WriteCSV($thing, "bar.csv");

WriteCSV($thing, "foo.csv", { comma => '; ' }, 
         [qw/GBKNR VBKNR EL LaD LaD_V/]);

WriteCSV($thing, \$target_string);

$target_string_ref = WriteCSV($thing);

Object-oriented interface:

$object = new Data::Rlist(-data => $thing, -output => "foo.dat")
$object->write_csv;     # Write $thing as CSV to foo.dat
$object->write;         # Write $thing as Rlist to foo.dat

$object->set(-output => \$target_string);
$object->write_csv;     # Write $thing as CSV to $target_string
write_string(DATA[, OPTIONS])

Like write() but always compiles to a new string to which it returns a reference. In an object this function does not use -output, even when this attribute defines a string reference. It also won't use -options. Instead it uses the predefined options set "string" to renders a very compact Rlist without newlines and here-docs.

make_string() and keelhaul()

make_string(DATA[, OPTIONS])

Print Perl DATA to a string and return its value. This function actually is an alias for

${Data::Rlist::write_string(DATA, OPTIONS)}

OPTIONS default to "default", which means that in an object context make_string will never use the -options attribute.

EXAMPLES

print "\n\$data: ", Data::Rlist::make_string($data);

$self = new Data::Rlist(-data => $thing);

print "\n\$thing: ", $self->make_string;
keelhaul(DATA[, OPTIONS])

Do a deep copy of DATA according to OPTIONS. DATA is some Perl data, or some Data::Rlist object.

keelhaul() works by first compile DATA to text, then restoring the data from the text. The text had been carefully built according to certain "Compile Options". Hence, by "keelhauling data", one can adjust the accuracy of numbers, break circular-references and remove (\*foo{THING}s).

keelhaul() is auto-exported as KeelhaulData(). After Data::Rlist has been used you may thus simply call KeelhaulData(); if it has been required you call Data::Rlist::keelhaul().

EXAMPLES

When keelhaul() is called in an array context it also returns the text from which the copy had been built:

$deep_copy = Data::Rlist::keelhaul($thing);

($deep_copy, $rlist_text) = Data::Rlist::keelhaul($thing);

$deep_copy = new Data::Rlist(-data => $thing)->keelhaul;

Bring all numbers in DATA to a certain accuracy:

$thing = { foo => [.00057260, -1.6804e-4] };

$deep_copy = Data::Rlist::keelhaul($thing, { precision => 4 });

which copies $thing into

{ foo => [0.0006, -0.0002] }

All number scalars where rounded to 4 decimal places, so they're finally comparable as floating-point numbers (see equal() for a discussion), One can also convert all floats to integers:

$self = Data::Rlist->new(-data => $thing);

$deep_copy = $self->keelhaul({precision => 0});

NOTES

It was said before that keelhauling is a working method to create a deep copy of Perl data. keelhaul() shall not die nor return an error. But be prepared for the following effects:

  • ARRAY, HASH, SCALAR and REF references were compiled, whether blessed or not. Depending on the compile options CODE references were called, deparsed back into their function bodies, or dropped.

  • IO, GLOB and FORMAT references have been converted into their plain typenames (see compile()).

  • undef'd array elements had been converted into the default scalar value "".

  • Compile options are considered, such as implicit rounding of floats.

  • Anything deeper than $Data::Rlist::MaxDepth is thrown away (again, see compile()).

  • Since compiling does not store type information, keelhaul() will turn blessed references into barbars again. No special methods to "freeze" and "thaw" an object is called before compiling or after parsing it. Instead the copy is a copy made from what any object in a computer ultimately consists of: strings and numbers.

predefined_options([PREDEF-NAME])

Get %Data::Rlist::PredefinedOptions{PREDEF-NAME}. PREDEF-NAME defaults to "default", the options for writing files.

complete_options([OPTIONS[, PREDEF-NAME]])

Completes OPTIONS (hash or name) using the predefined set PREDEF-NAME. (defaults to "default"). For example,

complete_options({ precision => 0 }, 'squeezed')

combines the predefined options for "squeezed" text (no whitespace at all, no here-docs, numbers are rounded to a precision of 6) with a numeric precision of 0. This converts all floats to integers.

Returns a reference to a new hash of "Compile Options".

Implementation

open_input() and close_input()

open_input(INPUT[, FILTER, FILTER-ARGS])
close_input()

Open/close Rlist text file or string INPUT for parsing. Used internally by read() and read_csv().

PREPROCESSING

If specified the function preprocesses the INPUT file using FILTER, before actually reading the file. Use the special value 1 for FILTER to select the default C preprocessor (precisely, gcc -E -Wp,-C). FILTER-ARGS is an optional string of additional command-line arguments appended to FILTER. For example,

my $foo = read("foo", 1, "-DEXTRA")

eventually does not parse foo, but the output of the command

gcc -E -Wp,-C -DEXTRA foo

Hence within foo C-preprocessor-statements are allowed:

{
#ifdef EXTRA
#include "extra.rlist"
#endif

    123 = (1, 2, 3);
    foobar = {
        .
        .

SAFE CPP MODE

This slightly esoteric mode involves sed and a temporary file. It is enabled by setting $Data::Rlist::SafeCppMode to 1 (the default). It protects single-line #-comments when FILTER begins with either gcc, g++ or cpp. open_input() then additionally runs sed to convert all input lines beginning with whitespace plus the # character. Only the following cpp-commands are excluded, and only when they appear in column 1:

- #include and #pragma

- #define and #undef

- #if, #ifdef, #else and #endif.

For all other lines sed converts # into ##. This prevents the C preprocessor from evaluating them. But because of Perl's limited open() function, which isn't able to open arbitary pipes, the invocation of sed requires a temporary file. The file is simply created by appending ".tmp" to the pathname passed in INPUT. lexln(), the function that feeds the lexical scanner with lines, then converts ## back into comment lines.

Alternately, use // and /* */ comments and set $Data::Rlist::SafeCppMode to 0.

lex() and parse()

lex()

Lexical scanner. Called by parse() to split the current line into tokens. lex() reads # or // single-line-comment and /* */ multi-line-comment as regular white-spaces. Otherwise it returns tokens according to the following table:

RESULT      MEANING
------      -------
'{' '}'     Punctuation
'(' ')'     Punctuation
','         Operator
';'         Punctuation
'='         Operator
'v'         Constant value as number, string, list or hash
'??'        Error
undef       EOF

lex() appends all here-doc-lines with a newline character. For example,

<<test1
a
b
test1

is effectively read as "a\nb\n", which is the same value as the equivalent here-doc in Perl has. Hence the purpose of the last character (the newline in the last line) is not just to separate the last line from the delimiter. As a consequence, not all strings can be encoded as a here-doc. For example, it might not be quite obvious to many programmers that "foo\nbar" has no here-doc-equivalent.

lexln()

Read the next line of text from the input. Return 0 if at_eof(), 1 otherwise.

at_eof()

Return true if current input file / string array is exhausted, false otherwise.

parse()

Read Rlist language productions from current input, defined by package variables. This is a fast, non-recursive parser driven by the parser map %Data::Rlist::Rules. See also lex().

errors(), broken() and missing_input()

errors([SELF])

Returns the number of syntax errors that occurred in the last call to parse().

When called as method (SELF defined) returns the number of syntax errors that occured for the last time an object had called read().

broken([SELF])

Return the number of times the last compile() crossed the zenith of $Data::Rlist::MaxDepth

When called as method (SELF defined) returns the information for the last time an object had called read().

missing_input([SELF])

Return true when the last call to parse() yielded undef, because there was nothing to parse. Otherwise, when parse() returned undef, this means there was some syntax error. parse() is called internally by read().

When called as method (SELF defined) returns the information for the last time an object had called read().

compile()

compile(DATA[, OPTIONS, FH])

Build Rlist from DATA. DATA is a Perl scalar as number, string or reference. When FH is defined compile directly to this file and return 1. Otherwise (FH is undef) build a string and return a reference to it.

Reference-types SCALAR, HASH, ARRAY and REF.

Compiled into text, whether blessed or not.

Reference-types CODE

How CODE references are compiled depends on the "code_refs" flag defined by OPTIONS. Legal values are undef, "call" (the default) and "deparse".

When "code-ref"'s value is undef compiles "?CODE?". A value of "call" calls the sub and compiles its result. "deparse" serializes the code using B::Deparse, which reproduces the Perl source of the sub. Note that it then makes sense to enable "here_docs", because otherwise the deparsed code will be in one string with LFs quoted as "\012".

Reference-types GLOB, IO and FORMAT

Reference-types that cannot be compiled are GLOB (typeglob-refs), IO (file- and directory handles) and FORMAT. These are then converted into "?GLOB?", "?IO?" and "?FORMAT?".

Background: A Short Story of "Typeglobs"

Typeglobs are an idiosyncracy of Perl. Perl uses a symbol table per package (namespace) to map identifier names (like "foo" without sigil) to values. The symbol table is stored in the hash, named like the package with two colons appended. The main symbol table's name is thus %main::, or %::.

For example, in the name "foo" in symbol tables is mapped to the typeglob value *foo. The typeglob object implements $foo (the scalar value), @foo (the list value), %foo (the hash value), &foo (the code value) and foo (the file handle or the format specifier). All types may coexist, so modifying $foo won't change %foo. But *baz = *foo overwrites, or creates, the symbol table entry "baz". (The value of "baz" will be another typeglob object.)

Typeglobs are variants that can store multiple concrete values. The sigil * serves as wildcard for the other sigils %, @, $ and &. (Note: a sigil is a symbol created for a specific magical purpose; the name derives from the latin sigilum = seal.) Perhaps the weirdest Perl primitives are \*foo, a typeglob-ref, and \*::, a typeglob-table-ref?

\*foo;              # yields 'GLOB(0xNNN)'
\*::;               # yields 'GLOB(0xNNN)'
die unless \*foo == *foo{GLOB}; # never fires

\*foo eventually is Perl's way to prove the existence of foo, the symbol. *foo is the internal "proxy" that tells perl what you really mean, at this moment, when you say "foo". In core this proxy is a hash-table, hence another way to say \*foo is *foo{GLOB}, which eventually refers to "foo"'s incarnation as typeglob *foo.

In other words: with typeglobs you reach the bedrock of perl, where the spade bends back.

Note, however, that after calling compile() typeglob-refs have gone up in smoke.

undef

undef'd values in arrays are compiled into the default Rlist "".

compile_fast(DATA)

Assemble Rlist from Perl data DATA as fast as actually possible with pure Perl. The main difference to compile() is that compile_fast() considers no format options. Thus it will not call code-references, cannot implicitly round numbers etc. It will also not detect recursively-defined data.

Reference-types SCALAR, HASH, ARRAY and REF: compiled into text, whether blessed or not.

Reference-types CODE, GLOB, IO and FORMAT: compiled as "?CODE?", "?IO?", "?GLOB?" and "?FORMAT?".

undef: undefined values in arrays are compiled into the default Rlist "".

compile_fast() returns a reference to the compiled string, which is a reference to a unique package variable. Subsequent calls to compile_fast() therefore reassign this variable.

AUXILIARY FUNCTIONS

In Perl the basic building block is a string. The utility functions in this section are generally useful when handling stringified data.

These functions are either very fast, or smart, or both. For example, quote(), unquote(), escape() and unescape() internally use precompiled regexes and precomputed ASCII tables; so employing these functions is probably faster then using own variants.

is_numeric(), is_name() and is_random_text()

is_integer(SCALAR-REF)

Returns true when a scalar looks like an +/- integer constant. The function applies the compiled regex $Data::Rlist::g_re_integer.

is_numeric(SCALAR-REF)

Test for strings that look like numbers. is_numeric() can be used to test whether a scalar looks like a integer/float constant (numeric literal). The function applies the compiled regex $Data::Rlist::g_re_float. Note that it doesn't match

- the IEEE 754 notations of Infinite and NaN,

- leading or trailing whitespace,

- lexical conventions such as the "0b" (binary), "0" (octal), "0x" (hex) prefix to denote a number-base other than decimal, and

- Perls' "legible numbers", e.g. 3.14_15_92

See also

perldoc -q "whether a scalar is a number"
is_name(SCALAR-REF)

Test for symbolic names. is_name() can be used to test whether a scalar looks like a symbolic name. Such strings need not to be quoted. Rlist defines symbolic names as a superset of C identifier names:

[a-zA-Z_0-9]                    # C/C++ character set for identifiers
[a-zA-Z_0-9\-/\~:\.@]           # Rlist character set for symbolic names

[a-zA-Z_][a-zA-Z_0-9]*                  # match C/C++ identifier
[a-zA-Z_\-/\~:@][a-zA-Z_0-9\-/\~:\.@]*  # match Rlist symbolic name

Scoped/structured names such as "std::foo", "msg.warnings", "--verbose", "calculation-info" need not be quoted. (But if they're quoted their value is exactly the same.) Note that is_name() does not catch leading or trailing whitespace. Another restriction is that "." cannot be used as first character, since it could also begin a number.

is_random_text(SCALAR-REF)

is_random_text() returns true if the scalar is neither a symbolic name nor a number, nor is double-quoted. When this function returns true, then compile() and compile_fast() would call quote() on the scalar, otherwise is written "as is". In Rlists, all scalars need to be quoted, expect those that are

- already quoted,

- look like C identifiers or symbolic names (see is_name()),

- look like C number constants.

Warning: is_random_text() makes no further test whether a string consists of characters that actually require escaping. That is, it returns also true on strings that do not adhere to 7-bit-ASCII, by defining characters <32 and >127.

See also is_numeric() and is_name().

quote(), escape() and unhere()

quote(TEXT)
escape(TEXT)

Converts TEXT into 7-bit-ASCII. All characters not in the set of the 95 printable ASCII characters are escaped. The difference between the two functions is that quote() additionally places TEXT into double-quotes.

The following ASCII codes will be converted to escaped octal numbers, i.e. 3 digits prefixed by a slash:

0x00 to 0x1F
0x80 to 0xFF
" ' \

For example, quote(qq'"Früher Mittag\n"') returns "\"Fr\374her Mittag\012\"", while escape() returns \"Fr\374her Mittag\012\"

maybe_quote(TEXT)

Return quote(TEXT) if is_random_text(TEXT); otherwise (TEXT defines a symbolic name or number) return TEXT.

unquote(TEXT)
unescape(TEXT)

Reverses quote() and escape().

unhere(HERE-DOC-STRING[, COLUMNS, FIRSTTAB, DEFAULTTAB])

HERE-DOC-STRING shall be a here-document. The function checks whether each line begins with a common prefix, and if so, strips that off. If no prefix it takes the amount of leading whitespace found the first line and removes that much off each subsequent line.

Unless COLUMNS is defined returns the new here-doc-string. Otherwise, takes the string and reformats it into a paragraph having no line more than COLUMNS characters long. FIRSTTAB will be the indent for the first line, DEFAULTTAB the indent for every subsequent line. Unless passed, FIRSTTAB and DEFAULTTAB default to the empty string "".

This function combines recipes 1.11 and 1.12 from the Perl Cookbook.

split_quoted()

split_quoted(INPUT[, DELIMITER])
parse_quoted(INPUT[, DELIMITER])

Divide the string, to which INPUT is a reference, into a list of strings. DELIMITER is a regular expression specifying where to split (default: '\s+'). The functions won't split at DELIMITERs inside quotes, or which are backslashed.

For example, to split INPUT at commas use pass '\s*,\s*' as the DELIMITER.

parse_quoted() works like split_quoted() but additionally removes all quotes and backslashes from the splitted fields. Both functions effectively simplify the interface of Text::ParseWords.

RESULT

In an array context both return a list of substrings, otherwise the count of substrings. An empty array is returned in case of unbalanced " quotes, as for the string foo,"bar.

EXAMPLES

split_quoted():

sub split_and_list($) {
    print ($i++, " '$_'\n") foreach split_quoted(shift)
}

split_and_list(q("fee foo" bar))

    0 '"fee foo"'
    1 'bar'

split_and_list(q("fee foo"\ bar))

    0 '"fee foo"\ bar'

The default DELIMITER '\s+' handles newlines. split_quoted("foo\nbar\n") returns ('foo','bar','') and hence can be used to to split a large string of uncho(m)p'd input lines into words:

split_and_list("foo  \r\n bar\n")

    0 'foo'
    1 'bar'
    2 ''

The DELIMITER matches everywhere outside of quoted constructs, so in case of the default '\s+' you may want to remove heading/trailing whitespace. Consider

split_and_list("\nfoo")
split_and_list("\tfoo")

    0 ''
    1 'foo'

and

split_and_list(" foo ")

    0 ''
    1 'foo'
    2 ''

parse_quoted():

sub parse_and_list($) {
    print ($i++, " '$_'\n") foreach parse_quoted(shift)
}

parse_and_list(q("fee foo" bar))

    0 'fee foo'
    1 'bar'

parse_and_list(q("fee foo"\ bar))

    0 'fee foo bar'

MORE EXAMPLES

String 'field\ one "field\ two"':

('field\ one', '"field\ two"')  # split_quoted
('field one', 'field two')      # parse_quoted

String 'field\,one, field", two"' with a DELIMITER of '\s*,\s*':

('field\,one', 'field", two"')  # split_quoted
('field,one', 'field, two')     # parse_quoted

Split a large string $soup (mnemonic: possibly "slurped" from a file) into lines, at LF or CR+LF:

@lines = split_quoted($soup, '\r*\n');

Then transform all @lines by correctly splitting each line into "naked" values:

@table = map { [ parse_quoted($_, '\s*,\s') ] } @lines

Here is some more complete code to parse a .csv-file with quoted fields, escaped commas:

open my $fh, "foo.csv" or die $!;
local $/;                   # enable localized slurp mode
my $content = <$fh>;        # slurp whole file at once
close $fh;
my @lines = split_quoted($content, '\r*\n');
die q(unbalanced " in input) unless @lines;
my @table = map { [ map { parse_quoted($_, '\s*,\s') } ] } @lines

Note, however, that the read_csv() function already reads .csv-file perfectly well.

A nice way to make sure what split_quoted() and parse_quoted() return is using deep_compare(). For example, the following code shall never die:

croak if deep_compare([split_quoted("fee fie foo")], ['fee', 'fie', 'foo']);
croak if deep_compare( parse_quoted('"fee fie foo"'), 1);

The 2nd call to parse_quoted() happens in scalar context, hence shall return 1 because there's one string to parse.

equal() and round()

equal(NUM1, NUM2[, PRECISION])
round(NUM1[, PRECISION])

Compare and round floating-point numbers. equal() returns true if NUM1 and NUM2 are equal to PRECISION (default: 6) number of decimal places. NUM1 and NUM2 are string- or number scalars.

Normally round() will return a number in fixed-point notation. When the package-global $Data::Rlist::RoundScientific is true round() formats the number in either normal or exponential (scientific) notation, whichever is more appropriate for its magnitude. This differs slightly from fixed-point notation in that insignificant zeroes to the right of the decimal point are not included. Also, the decimal point is not included on whole numbers. For example, round(42) does not return 42.000000, and round(0.12) returns 0.12, not 0.120000. This behavior is especially welcome when scientific notation was selected. For example, note that

sprintf("%.6g\n", 2006073104)

yields 2.00607e+09, which looses digits.

MACHINE ACCURACY

One needs equal() to compare floats because IEEE 754 single- and double precision implementations are not absolute - in contrast to the numbers they represent. In all machines non-integer numbers are only an approximation to the numeric truth. In other words, they're not commutative! For example, given two floats a and b, the result of a+b might be different than that of b+a.

Each machine has its own (in)accuracy, called the machine epsilon, which is the difference between 1 and the smallest exactly representable number greater than one. Therefore, most of the time only floats can be compared that have been carried out to a certain number of decimal places. In general this is the case when two floats that result from a numeric operation are compared - but not two constants. (Constants are accurate through to lexical conventions of the language. The Perl and C syntaxes for numbers simply won't allow you to write down inaccurate numbers in code.)

See also recipes 2.2 and 2.3 in the Perl Cookbook.

EXAMPLES

CALL                    RETURNS NUMBER
----                    --------------
round('0.9957', 3)       0.996
round(42, 2)             42
round(0.12)              0.120000
round(0.99, 2)           0.99
round(0.991, 2)          0.99
round(0.99, 1)           1.0
round(1.096, 2)          1.10
round(+.99950678)        0.999510
round(-.00057260)       -0.000573
round(-1.6804e-6)       -0.000002

deep_compare()

deep_compare(A, B[, PRECISION, PRINT])

Compare and analyze two numbers, strings or references. Generates a log (stack of messages) describing exactly all unequal data. Hence, for some perl data $a and $b one can assert:

croak "$a differs from $b" if deep_compare($a, $b);

When PRINT is true traces progress on stdout.

RESULT

Returns an array of messages, each describing unequal data, or data that cannot be compared because of type- or value-mismatching. The array is empty when deep comparison of A and B found no unequal numbers or strings, and only indifferent types.

EXAMPLES

The result is line-oriented, and for each mismatch it returns a single message:

Data::Rlist::deep_compare(undef, 1)

yields

<<undef>> cmp <<1>>   stop! 1st undefined, 2nd defined (1)

Some more complex example. Deep-comparing two multi-level data structures A and B returned two messages:

'String literal' == REF(0x7f224)   stop! type-mismatch (scalar versus REF)
'Greetings, earthlings!' == CODE(0x7f2fc)   stop! type-mismatch (scalar versus CODE)

Somewhere in A a string "String literal" could not be compared, because the corresponding element in B is a reference to a reference. Next it says that "Greetings, earthlings!" could not be compared because the corresponding element in B is a code reference. (One could assert, however, that the actual opacity here is that they speak ASCII.)

Actually, A and B are identical. B was written to disk (by write())and then read back as A (by read()). So, why don't they compare anymore? Because in B the refs REF(0x7f224) and CODE(0x7f2fc) hide

\"String literal"

and

sub { 'Greetings, earthlings!' }

When writing B to disk write() has dissolved the scalar- and the code-reference into "String literal" and "Greetings, earthlings!". Of course, deep_compare() will not do that, so A does not compare to B anymore. Note that despite these two mismatches, deep_compare() had continued the comparision for all other elements in A and B. Hence the structures are identical in all other elements.

deep_compare() does not support an object-oriented interface, which allows even Data::Rlist package objects within A and B.

fork_and_wait(), synthesize_pathname()

fork_and_wait(PROGRAM[, ARGS...])

Forks a process and waits for completion. The function will extract the exit-code, test whether the process died and prints status messages on stderr. fork_and_wait() hence is a handy wrapper around the built-in system and exec functions. Returns an array of three values:

($exit_code, $failed, $coredump)

$exit_code is -1 when the program failed to execute (e.g. it wasn't found or the current user has insufficient rights). Otherwise $exit_code is between 0 and 255. When the program died on receipt of a signal (like SIGINT or SIGQUIT) then $signal stores it. When $coredump is true the program died and a core file was written. Note that some systems store cores somewhere else than in the programs' working directory.

synthesize_pathname(TEXT...)

Concatenates and forms all TEXT strings into a symbolic name that can be used as a pathname. synthesize_pathname() is a useful function to reuse a string, assembled from multiple strings, coinstantaneously as hash key, database name, and file- or URL name. Note, however, that few characters are mapped to only "_" and "-".

IMPORTED FUNCTIONS

Explicit Imports

Three tags are available that import function sets. These are utility functions usable also separately from Data::Rlist.

:floats

Imports equal(), round() and is_numeric().

:strings

Imports maybe_quote(), quote(), escape(), unquote(), unescape(), unhere(), is_random_text(), is_numeric(), is_name(), split_quoted(), and parse_quoted().

:options

Imports predefined_options() and complete_options().

:aux

Imports deep_compare(), fork_and_wait() and synthesize_pathname().

EXAMPLES

use Data::Rlist qw/:floats :strings/;

Automatic Imports

These functions are implicitly imported into the callers symbol table by the package: ReadCSV(), ReadData(), WriteData(), PrintData(), OutlineData(), StringizeData(), SqueezeData(), KeelhaulData() and CompareData().

You may say require Data::Rlist (instead of use Data::Rlist) to prohibit auto-import. See also perlmod.

Importing when Rlist.pm is installed locally

When the package is installed locally, e.g. as ~/bin/Rlist.pm, you may use:

BEGIN {
    $0 =~ /[^\/]+$/;
    push @INC, $`||'.', "$ENV{HOME}/bin";
    require Rlist;
    Data::Rlist->import();
    Data::Rlist->import(qw/:floats :strings/);
}

This code finds Rlist.pm also in . and ~/bin. It then calls the Exporter manually. Note that installing CPAN packages usually requires administrator privileges. In case you don't have them, another way is to the Rlist.pm file e.g. into . or ~/bin.

ReadCSV() and ReadData()

ReadCSV(INPUT[, DELIMITER, FILTER, FILTER-ARGS])

Calls read_csv().

ReadData(INPUT[, FILTER, FILTER-ARGS])

Calls read().

WriteCSV() and WriteData()

WriteCSV(DATA[, OUTPUT, OPTIONS, COLUMNS])

Calls write_csv().

WriteData(DATA[, OUTPUT, OPTIONS, HEADER])

Calls write().

OutlineData(), StringizeData() and SqueezeData()

OutlineData(DATA[, OPTIONS])
StringizeData(DATA[, OPTIONS])
SqueezeData(DATA[, OPTIONS])

Calls make_string().

OutlineData() applies the predefined "outlined" options set, while StringizeData() applies "string" and SqueezeData() "squeezed". When specified, OPTIONS are merged into the predefined set. For example,

print "\n\$thing: ", OutlineData($thing, { precision => 12 });

rounds all numbers in $thing to 12 digits.

KeelhaulData() and CompareData()

KeelhaulData(DATA[, OPTIONS])

Calls keelhaul(). For example,

use Data::Rlist;
    .
    .
my($copy, $as_text) = KeelhaulData($thing);
CompareData(A, B[, PRECISION, PRINT_TO_STDOUT])

Calls deep_compare().

HISTORY / NOTES

The Random Lists (Rlist) syntax is inspired by NeXTSTEP's Property Lists. Rlist is simpler, more readable and more portable. The Perl, C and C++ implementations are fast, table and free. Markus Felten, with whom I worked a few month in a project at Deutsche Bank, Frankfurt in summer 1998, arrested my attention on Property lists. He had implemented a Perl variant of it (http://search.cpan.org/search?dist=Data-PropertyList).

The term "Random" underlines the fact that the language

  • has only four primitive data types;

  • the basic building block is a list (sequential or associative), and this list can be combined at random with other lists.

Hence the term "Random" does not mean aimless or accidental. Random Lists are arbitrary lists. Application data can be made portable (due to 7-bit-ASCII) and persistent by dealing arbitrarily with lists of numbers and strings. Like with CSV the lexical overhead Rlist imposes is minimal: files are merely data. Also, files are viewable/editable by text editors. Users then shall not be dazzled by language gizmo's.

SEE ALSO

Data::Dumper

In contrast to the Data::Dumper, Data::Rlist scalars will be typed as number or string; Data::Dumper writes numbers also as quoted strings. For example, it writes

$VAR1 = {
            'configuration' => {
                                'verbose' => 'Y',
                                'importance_sampling_loss_quantile' => '0.04',
                                'distribution_loss_unit' => '100',
                                'default_only' => 'Y',
                                'num_threads' => '5',
                                        .
                                        .
                               }
        };

where Data::Rlist writes

{
    configuration = {
        verbose = Y;
        importance_sampling_loss_quantile = 0.04;
        distribution_loss_unit = 100;
        default_only = Y;
        num_threads = 5;
            .
            .
    }
}

As one can see Data::Dumper writes the data right in Perl syntax, which means the dumped text can be simply eval'd, which is very fast. Rlists are not Perl-syntax and need to be parsed carefully. But Rlist text is portable (7-bit-ASCII with non-printables escaped) and implementations exist for other programming languages, namely C++ which uses a fast flex/bison-parser.

While reading Data::Dumper-generated files back is generally faster than read(). For example, with $Data::Dumper::Useqq enabled, it was observed that Data::Dumper renders output three to four times slower than compile()

Consider also that Data::Rlist tests for any scalar whether it is numeric or not (see is_random_text()), where Data::Dumper simply quotes any number and string. So Data::Rlist is able to implicitly round floats to a certain precision, making them finally comparable (see round() for more information).

Data::Rlist generates much smaller files: with the default $Data::Dumper::Indent of 2 Rlist output is just 15-20% of the size the Data::Dumper package prints (for the same data). The simple reason: Data::Dumper recklessly uses many whitespaces (blanks) instead of horizontal tabulators; this unnecessarily blows up file sizes.

DEPENDENCIES

Data::Rlist depends only on few other packages:

Exporter
Carp
strict
integer
Sys::Hostname
Scalar::Util        # deep_compare() only
Text::Wrap          # unhere() only
Text::ParseWords    # split_quoted(), parse_quoted() only

Data::Rlist is free of $&, $` or $'. Reason: once Perl sees that you need one of these meta-variables anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program (see also perlre).

BUGS AND DEFICIENCIES

There are no known bugs, this package is stable.

Deficiencies:

  • This version yet hasn't implemented nanoscripts.

  • The "deparse" functionality for the "code_refs" compile option has not yet been implemented.

  • The read() and write() functions shall not be called concurrently from threads. The code had been optimized for speed and relies on certain package globals.

  • IEEE 754 notations of Infinite and NaN not yet implemented.

  • To increase compilation speed, a string $s is only quote()d when $s!~$Data::Rlist::g_re_value. (Note that this regex is applied also by is_random_text().) The regex checks wether $s begins with ", or defines a symbolic name or a number. But when the 1st character of $s is ", no further test are made whether characters in the actually require escaping. It is then believed that the string adheres to 7-bit-ASCII. If this isn't the case it might not be read back correctly.

    See also is_name(), is_integer() and is_numeric().

AUTHOR

Andreas Spindler, rlist@visualco.de

COPYRIGHT AND LICENSE

Copyright 1998-2007 Andreas Spindler

Maintained at CPAN and http://www.visualco.de

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.

Thank you for your attention.