NAME
Data::Rlist - A lightweight data language for Perl, C and C++
SYNOPSIS
use Data::Rlist;
Data from files:
Data::Rlist::write($data, $filename);
$data = Data::Rlist::read($filename);
$data = ReadData($filename);
Data from text:
$string_ref = Data::Rlist::write_string($data);
$string = Data::Rlist::make_string($data);
$data = Data::Rlist::read_string($string);
$data = ReadData(\$string);
Object-oriented interface:
$object = new Data::Rlist(-data => $thing, -output => \$target_string)
$string_ref = $object->write; # compile $thing, return \$target_string
use Env qw/HOME/;
$object->set(-output => "$HOME/.foorc");
$object->write(".barrc"); # the argument overrides -output
$object->write; # write "~/.foorc", return 1
WriteData($object); # dto.
The -input attribute defines the text to be compiled into Perl data:
$object->set(-input => \$input_string);
$data = $object->read;
$data = $object->read($other); # overrides -input
$object->set(-input => "$HOME/.foorc");
$data = $object->read; # parse "~/.foorc"
$data = $object->read("$HOME/.barrc"); # override -input
$data = $object->read(\$string); # parse $string
$data = $object->read_string($string_or_ref);
$data = ReadData($string_or_ref);
Make up a string out of thin air, no matter how -output is set:
$string_ref = $object->write_string; # write to new string (ignores -output)
$string = $object->make_string; # dto. but return string value, not ref
print $object->make_string; # dumps $thing
PrintData($object); # dto.
PrintData($thing); # dto.
Using Data::Rlist one can also create deep-copies of Perl data:
$reloaded = Data::Rlist::keelhaul($thing);
$object = new Data::Rlist(-data => $thing);
$reloaded = $object->keelhaul;
$reloaded = KeelhaulData($object);
The functionality is called keelhauling. The metaphor vividly connotes that $thing is stringified, then compiled back. See "keelhaul"() for why this only sounds useless.
The little brother of "keelhaul"() is "deep_compare"():
print join("\n", Data::Rlist::deep_compare($a, $b));
VENUE
Random-Lists (Rlist) is a tag/value format to describe data structures as plain text. Therefore it defines lists of values (arrays) and tags/values (hashes). Basic values are constant strings and numbers. The format attempts to represent the data pure and untinged, but without breaking its structure or legibility. The language
- allows the definition of hierachical data,
- disallows recursively-defined data,
- does not consider user-defined types,
- defines no keywords, no variables and no arithmetic expressions,
- defines only constant data,
- uses 7-bit-ASCII character encoding.
Rlists are built from only four primitives: number, string, array and hash. Like with CSV the lexical overhead Rlist imposes is minimal: files are merely data. They're processable by scripts, and in text editors users see the pure data in a structured from, rather then getting dazzled by language gizmo's.
With Rlist data is not typified, and hence data schemes are tacit consents between the users of the data (the programs). But schemes can be implemented by storing the meta information together with the data itself.
- Numbers and Strings (Scalars)
-
Strings:
"Hello, World!" <<hamlet "This above all: to thine own self be true". - (Act I, Scene III). hamlet
Symbols:
foobar cogito.ergo.sum Memento::mori
Numbers:
38 10e-6 -.7 3.141592653589793
Strings are wrapped by double-quotes. Identifiers (or: symbolic names) are strings consisting only of [a-zA-Z_0-9-/~:.@] characters; for them the quotes are optional. Numbers adhere to the IEEE 754 syntax for integer- and floating-point numbers. For details see is_symbol() and is_number().
- Arrays and Hashes (Lists)
-
Arrays are sequential lists:
( 1, 2, ( 3, "Audiatur et altera pars!" ) )
Hashes are associative lists:
{ key = value; lonely-key; 3.14159 = Pi; "Meta-syntactic names" = (foo, bar, "lorem ipsum", Acme, ___); }
EXAMPLES
Single strings and numbers:
"Hello, World!"
foo # compiles to { 'foo' => undef }
3.1415 # compiles to { 3.1415 => undef }
Array:
(1, a, 4, "b u z") # list of numbers/strings
((1, 2),
(3, 4)) # list of list (4x4 matrix)
((1, a, 3, "foo bar"),
(7, c, 0, "")) # another list of lists
Array of strings:
warning = (
"main correlation-matrix not positive-definite",
"using pseudo-decomposed sigma-matrix",
"cannot evaluate CVaR: the no. of simulations is to low for confidence-level 0.90"
);
Configuration object as hash:
{
contribution_quantile = 0.99;
default_only_mode = Y;
importance_sampling = N;
num_runs = 10000;
num_threads = 10;
# etc.
}
A comprehensive example:
Metaphysic-terms =
{
Numbers =
{
3.141592653589793 = "The ratio of a circle's circumference to its diameter.";
2.718281828459045 = <<___;
The mathematical constant "e" is the unique real number such that the value of
the derivative (slope of the tangent line) of f(x) = e^x at the point x = 0 is
exactly 1.
___
42 = "The Answer to Life, the Universe, and Everything.";
};
Words =
{
ACME = <<Value;
A fancy-free Company [that] Makes Everything: Wile E. Coyote's supplier of equipment and gadgets.
Value
<<Key = <<Value;
foo bar foobar
Key
[JARGON] A widely used meta-syntactic variable; see foo for etymology. Probably
originally propagated through DECsystem manuals [...] in 1960s and early 1970s;
confirmed sightings go back to 1972. [...]
Value
};
};
DESCRIPTION
Audience
Rlist is useful as a "glue data language" between different systems and programs, for configuration files and object serialization. The format excels over comma-separated values (CSV), but isn't as excessive as XML:
Like CSV the format describes merely the data itself, but the data may be structured in multiple levels, not just lines.
Like XML data can be as complex as required, but while XML is geared to markup data within some continuous text (the document), Rlist defines the pure data structure.
Portable implementations yet exist for Perl, C and C++. They're stable, efficient and do not depend on other software. The Perl implementation operates directly on builtin types, where C++ uses STL types. Either way data integrity is guaranteed: floats won't loose their precision, Perl strings are loaded into std::strings, and Perl hashes and arrays resurrect in as std::maps and std::vectors.
The implementations scale well: a single text files can express hundreds of megabytes of data, while the data is readable in constant time and with constant memory requirements. This makes files applicable as "mini-databases" loaded into RAM at program startup. For example, http://www.sternenfall.de uses Rlist instead of a MySQL database.
Character Encoding
Rlist text uses 7-bit-ASCII. The 95 printable character codes 32 to 126 occupy one character. Codes 0 to 31 and 127 to 255 require four characters each: the \ escape character followed by the octal code number. For example, the German Umlaut character ü (252) is translated into \374. An exception are the following codes:
ASCII ESCAPED AS
----- ----------
9 tab \t
10 linefeed \n
13 return \r
34 quote " \"
39 quote ' \'
92 backslash \ \\
Values
Rlist values are either scalars, array elements or the value of a pair. They're always constant.
Scalar Values
All program data is finally convertible into numbers and strings. In Rlist number and string constants follow the C language lexicography. Strings that look like C identifier names must not be quoted.
Strings are quoted implicitly when building Rlists; when reading them back strings are unquoted. Quoting means to encode characters according to the input character set (see above), then to double-quote the result.
Default Values
By definition all input is compiled into an array or hash; hashes are the default. For example, the string "Hello, World!"
is compiled into:
{ "Hello, World!" => undef }
Likewise the parser of the C++ implementation by default returns a std::map with one pair. The default scalar value is the empty string ""
. In Perl, undef'd list elements are compiled into ""
.
Here-Documents
Rlist is capable of a line-oriented form of quoting based on the UNIX shell here-document syntax and RFC 111. Multi-line quoted strings can be expressed with
<<DELIMITER
Following the sigil << an identifier specifies how to terminate the string scalar. The value of the scalar will be all lines following the current line down to the line starting with the delimiter. There must be no space between the << and the identifier. For example,
{
var = {
log = {
messages = <<LOG;
Nov 27 21:55:04 localhost kernel: TSC appears to be running slowly. Marking it as unstable
Nov 27 22:34:27 localhost kernel: Uniform CD-ROM driver Revision: 3.20
Nov 27 22:34:27 localhost kernel: Loading iSCSI transport class v2.0-724.<6>PNP: No PS/2 controller found. Probing ports directly.
Nov 27 22:34:27 localhost kernel: wifi0: Atheros 5212: mem=0x26000000, irq=11
LOG
};
};
}
Binary Data
Binary data shall be represented as base64-encoded string, or here-document string. For example,
use MIME::Base64;
$str = encode_base64($binary_buf);
The returned encoded string $str is broken into lines of no more than 76 characters each and it will end with "\n"
unless it is empty. Since $str ends with "\n"
it qualifies as here-document. See also Encode, MIME::Base64.
EXAMPLE
use Data::Rlist;
use MIME::Base64;
$binary_data = join('', map { chr(int rand 256) } 1..300);
$sample = { random_string => encode_base64($binary_data) };
WriteData $sample, 'random.rls', 'default';
Writes a file random.rls that looks like:
{
random_string = <<___
w5BFJIB3UxX/NVQkpKkCxEulDJ0ZR3ku1dBw9iPu2UVNIr71Y0qsL4WxvR/rN8VgswNDygI0xelb
aK3FytOrFg6c1EgaOtEudmUdCfGamjsRNHE2s5RiY0ZiaC5E5XCm9H087dAjUHPtOiZEpZVt3wAc
KfoV97kETH3BU8/bFGOqscCIVLUwD9NIIBWtAw6m4evm42kNhDdQKA3dNXvhbI260pUzwXiLYg8q
MDO8rSdcpL4Lm+tYikKrgCih9UxpWbfus+yHWIoKo/6tW4KFoufGFf3zcgnurYSSG2KRLKkmyEa+
s19vvUNmjOH0j1Ph0ZTi2pFucIhok4krJi0B5yNbQStQaq23v7sTqNom/xdRgAITROUIoel5sQIn
CqxenNM/M4uiUBV9OhyP
___
;
}
Each line accept the last line in the here-doc has 75 characters, plus the newline. Note that from the predefined-compile options only "default"
and "outlined"
enable here-docs.
Embedded Perl Code
Rlists may define embedded programs: nanonscripts. They're defined as here-document that is delimited with the special delimiter "perl"
. For example,
hello = (<<perl);
print "Hello, World!";
perl
After the text has been fully parsed such strings are eval'd in the order of their occurrence. Within the eval %root or @root defines the root of the current Rlist.
Comments
Rlist supports multiple forms of comments: // or # single-line-comments, and /* */ multi-line-comments.
Compile Options
The format of the compiled text and the behavior of "compile"() can be controlled by the OPTIONS parameter of "write"(), "write_string"() etc. The argument is a hash defining how the Rlist text shall be formatted. The following pairs are recognized:
- 'precision' => PLACES
-
Make "compile"() round all numbers to PLACES decimal places, by calling "round"() on each scalar that looks like a number. By default PLACES is undef, which means floats are not rounded.
- 'scientific' => FLAG
-
Causes "compile"() to masquerade $Data::Rlist::RoundScientific. See "round"().
- 'code_refs' => TOKEN
-
Specifiy how "compile"() shall treat CODE reference. Legal values for TOKEN are 0 (the default),
"call"
and"deparse"
.0 compiles the reference into the string
"?CODE?"
."call"
calls the code, then compiles the return value."deparse"
serializes the code using B::Deparse, which reproduces the Perl source. Note that it then makes sense to enable"here_docs"
(see below), because otherwise the deparsed code will be in one string with LFs quoted as"\n"
. - 'threads' => COUNT
-
If enabled "compile"() internally use multiple threads. Note that this makes only sense on machines with at least COUNT CPUs.
- 'here_docs' => FLAG
-
If enabled strings with at least two newlines in them are written as here-document, when possible. Note that the string has to be terminated with a
"\n"
to qualify as here-document. - 'auto_quote' => FLAG
-
When true do not quote strings that look like identifiers (by means of is_symbol()), otherwise quote all strings. Note that hash keys are not affected by this flag. The default is true, but not for write_csv() and write_conf(), where the default is false (quote all non-numbers).
- 'outline_data' => NUMBER
-
Use
"eol_space"
(linefeed) to "distribute data on many lines." Insert a linefeed after every NUMBERth array value; 0 disables outlining. - 'outline_hashes' => FLAG
-
If enabled, and
"outline_data"
is also enabled, prints { and } on distinct lines when compiling Perl hashes with at least one pair. - 'separator' => STRING
-
The comma-separator string to be used by "write_csv"(). The default is
','
. - 'delimiter' => REGEX
-
Field-delimiter for "read_csv"(). There is no default value. To read configuration files, for example, you may use
'\s*=\s*'
or'\s+'
; and to read CSV-files you may use'\s*[,;]\s*'
.
The following options format the generated Rlist; normally you don't want to modify them:
- 'bol_tabs' => COUNT
-
Count of physical, horizontal TAB characters to use at the begin-of-line per indentation level. Defaults to 1. Note that we don't use blanks, because they blow up the size of generated text without measure.
- 'eol_space' => STRING
-
End-of-line string to use (the linefeed). For example, legal values are
""
," "
,"\r\n"
etc. The default is"\n"
. - 'paren_space' => STRING
-
String to write after ( and {, and before } and ) when compiling arrays and hashes.
- 'comma_punct' => STRING
- 'semicolon_punct' => STRING
-
Comma and semicolon strings, which shall be at least
","
and";"
. No matter what, "compile"() will always print the"eol_space"
string after the"semicolon_punct"
string. - 'assign_punct' => STRING
-
String to make up key/value-pairs. Defaults to
" = "
. Note the this is a compile option: the parser always expects some"="
to designate a pair.
Predefined Options
The OPTIONS parameter accepted by some package functions is either a hash-ref or the name of a predefined set:
- 'default'
-
Default if writing to a file.
- 'string'
-
Compact, no newlines/here-docs. Renders a "string of data".
- 'outlined'
-
Optimize the compiled Rlist for maximum readability.
- 'squeezed'
-
Very compact, no whitespace at all. For very large Rlists.
- 'perl'
-
Compile data in Perl syntax, using "compile_Perl"(), not "compile"(). The output then can be eval'd, but it cannot be "read"() back.
- 'fast' or undef
-
Compile data as fast as possible, using compile_fast(), not "compile"().
All functions that define an OPTIONS parameter implicitly call "complete_options"() to complete the argument from one of the predefined sets, and "default"
. Therefore you may just define a "lazy subset of options" to these functions. For example,
my $obj = new Data::Rlist(-data => $thing);
$obj->write('thing.rls', { scientific => 1, precision => 8 });
Debugging Data (Finding Self-References)
Debugging (hierachical) data means breaking recursively-defined data.
Set $Data::Rlist::MaxDepth to an integer above 0 to define the depth under which "compile"() shall not venture deeper. 0 disables debugging. When positive compilation breaks on deep recursions caused by circular references, and on stderr a message like the following is printed:
ERROR: compile2() broken in deep ARRAY(0x101aaeec) (depth = 101, max-depth = 100)
The message will also be repeated as comment when the compiled Rlist is written to a file. Furthermore $Data::Rlist::Broken is incremented by one - and compilation continues! So, any attempt to venture deeper as suggested by $Data::Rlist::MaxDepth in the data will be blocked, but compilation continues above that depth. Please see "broken"().
PACKAGE FUNCTIONS
Construction
new() and dock()
The core functions to cultivate package objects are new(), "set"() and "get"().
new() allocates a Data::Rlist object, accepting values peculiar to the new object. These attributes will be implicitly used in place of arguments that are normally passed to package functions - when these functions are called in the context of an object. The following functions may be called also as instance methods:
read() write()
read_string() write_string()
read_csv() write_csv()
read_conf() write_conf()
keelhaul()
When ennobled to methods these functions load their arguments from synonymous attributes of the object. As usual the object is defined by the first argument. Other arguments are optional, but when specified they have precedence over attributes. Note, however, that unless these functions are called as methods their first argument has an indifferent meaning. For example, "read"() excepts an input file or string as the first argument, "write"() the data to compile etc.
"dock"() is used to exclusively link some object to the package, which means that some package globals are temporarily set from its attributes. Each function that is called as method uses dock() to localize globals and hence to lock the package.
- new(ATTRIBUTES)
-
Create a Data::Rlist object from ATTRIBUTES, a hash-table. For example,
$self = Data::Rlist->new(-input => 'this.dat', -data => $thing, -output => 'that.dat');
creates an object for which the call $self->read() reads from this.dat, and $self->write() writes $thing to that.dat.
PARAMETERS
- -input => INPUT
- -filter => FILTER
- -filter_args => FILTER-ARGS
-
Defines what to parse. INPUT shall be a filename or string reference. FILTER and FILTER-ARGS define how to preprocess an input file. FILTER can be 1 to select the standard C preprocessor cpp. These attributes are applied by "read"(), "read_string"(), read_conf() and "read_csv"().
- -data => DATA
- -output => OUTPUT
- -options => OPTIONS
- -header => HEADER
-
DATA defines the Perl data to be compiled into text. OPTIONS defines how the text shall be compiled, and OUTPUT where to put it. HEADER defines the comments: an array of text lines, each of which will by prefixed by a # and then written at the top of the output file. These attributes are applied by "write"(), "write_string"(), "write_conf"(), "write_csv"() and "keelhaul"().
- -delimiter => DELIMITER
-
Defines the field delimiter for .csv-files. Applied by "read_csv"() and "read_conf"().
- -columns => STRINGS
-
Defines the column names for .csv-files that, when available, are written into the first line. Applied by "write_csv"() and "write_conf"().
ATTRIBUTES THAT MASQUERADE PACKAGE GLOBALS
These attributes raise new values for package globals while instance methods are executed. You will notice that some globals can also be set by the compile options. But while these options are anonymuous hash-tables, possible shared by many objects, the below attributes define such options per object. This means they're charged each time a function is called as an instance method. (To afford this the method internally calls "dock"().)
For example, when $Data::Rlist::RoundScientific is true Data::Rlist::"round"() formats the number in either normal or exponential (scientific) notation, whichever is more appropriate for its magnitude. round() is called during compilation when the
"precision"
option is defined, in order to round all numbers to a certain count of decimal places. By setting -RoundScientific this sort of formatting can be enabled per object, not per package.- -MaxDepth => INTEGER
- -SafeCppMode => FLAG
- -RoundScientific => FLAG
-
Masquerades $Data::Rlist::MaxDepth, $Data::Rlist::SafeCppMode and $Data::Rlist::RoundScientific.
- -DefaultCsvDelimiter => REGEX
- -DefaultConfDelimiter => REGEX
-
Masquerades $Data::Rlist::DefaultCsvDelimiter (for "read_csv"()) and $Data::Rlist::DefaultConfDelimiter (for "read_conf"()). These globals define the default regexes to use when the -options attribute does not specifiy the
"delimiter"
regex. - -DefaultConfSeparator => STRING
-
"write_conf"() uses this attribute to masquerade $Data::Rlist::DefaultConfSeparator, the default string to use when the -options attribute does not specifiy the
"separator"
string.
- dock(SELF, SUB)
-
Wire some flittering object SELF back to the package that incubated it (this one).
dock() saves some package globals and sets their new values based on SELF's attributes. Then it calls SUB (a code-reference) in the realm of the new globals. After SUB returned it restores the globals and returns what SUB had returned.
While SUB runs, the package is dedicated to SELF and hence locked ($Data::Rlist::Locked is true).
The saved globals are:
$Data::Rlist::MaxDepth $Data::Rlist::SafeCppMode $Data::Rlist::RoundScientific $Data::Rlist::DefaultCsvDelimiter, $Data::Rlist::DefaultConfDelimiter $Data::Rlist::DefaultConfSeparator
set() and get()
- set(SELF[, ATTRIBUTE]...)
-
Reset or initialize object attributes, then return SELF. Each ATTRIBUTE is a name/value-pair. See "new"() for a list of valid names. For example,
$obj->set(-input => \$str, -output => 'temp.rls', -options => 'squeezed');
- get(SELF, NAME[, DEFAULT])
- require(SELF[, NAME])
- has(SELF[, NAME])
-
Get some attribute NAME from object SELF. Unless NAME exists returns DEFAULT. The require() method has no default value, hence it dies unless NAME exists. has() returns true when NAME exists, false otherwise. For NAME the leading hyphen is optional. For example,
$self->get('foo'); # returns $self->{-foo} or undef $self->get(-foo=>); # dto. $self->get('foo', 42); # returns $self->{-foo} or, unless exists, 42
Interface
This section lists the public functions to be called by users of the package. These can be either called as package functions or instance methods.
read(), read_string(), read_csv() and read_conf()
- read(INPUT[, FILTER, FILTER-ARGS])
-
Parse data from INPUT, which specifies some Rlist-text. See also "errors"(), "write"().
PARAMETERS
INPUT shall be either
- some Rlist object created by "new"(),
- a string reference, in which case read() and "read_string"() parse Rlist text from it,
- a string scalar, in which case read() assumes a file to parse.
See "open_input"() for the FILTER and FILTER-ARGS parameters, which are used to preprocess an input file. When an input file cannot be open'd and flock'd this function dies. When INPUT is an object you specify FILTER and FILTER-ARGS to overload the -filter and -filter_args attributes.
RESULT
"read"() returns the parsed data as array- or hash-reference, or undef if there was no data. The latter may also be the case when file consist only of comments/whitespace.
NOTES
This function may die. Dying is Perl's mechanism to raise exceptions, which can be catched with eval. For example,
my $host = eval { use Sys::Hostname; hostname; } || 'some unknown machine';
This code fragment traps the die exception: when it raised eval returns undef, otherwise the result of calling hostname. For read() this means
$data = eval { Data::Rlist::read($tempfile) }; unless (defined $data) { print STDERR "$tempfile not found, is locked or is empty" } else { # use $data . . }
- read_csv(INPUT[, OPTIONS, FILTER, FILTER-ARGS])
- read_conf(INPUT[, OPTIONS, FILTER, FILTER-ARGS])
-
Parse data from INPUT, which specifies some comma-separated-values (CSV) text. Both functions
- read data from strings or files,
- use an optional delimiter,
- ignore delimiters in quoted strings,
- ignore empty lines,
- ignore lines begun with # as comments.
read_conf() is a variant of read_csv() dedicated to configuration files. Such files consist of lines of the form
key = value
That is, read_conf() simply uses a default delimiter of
'\s*=\s*'
, while read_csv() uses'\s*,\s*'
. Hence read_csv() can be used as well for configuration files. For example, a delimiter of'\s+'
splits the line at horizontal whitespace into multiple values (but, of course, not within quoted strings).See also "ReadCSV"(), "ReadConf"(), "write_csv"() and "write_conf"().
PARAMETERS
- INPUT
-
Please see "read"().
- FILTER, FILTER-ARGS
-
Please see "open_input"().
- OPTIONS
-
The actual difference between read_conf() and read_csv() is the default value for the
"delimiter"
regex in OPTIONS:FUNCTION DELIMITER read_csv() '\s*,\s*' read_conf() '\s*=\s*'
Note that the above defaults are actually defined by the package-globals $Data::Rlist::DefaultCsvDelimiter and $Data::Rlist::DefaultConfDelimiter.
RESULT
Both functions return a list of lists. Each embedded array defines the fields in a line, and may be of variable length.
EXAMPLES
Un/qouting of values happens implicitly. Given a file db.conf
# Comment SERVER = hostname DATABASE = database_name LOGIN = "user,password"
the call
$opts = Data::Rlist::read_conf('db.conf');
returns (as $opts)
[ [ 'SERVER', 'hostname' ], [ 'DATABASE', 'database_name' ], [ 'LOGIN', 'user,password' ] ]
To convert such an array into a hash
%conf
, use%conf = map { @$_ } @{ReadConf 'db.conf'};
The write_conf() function can be used to update db.conf from $opts, so that
push @$opts, [ 'MAGIC VALUE' => 3.14_15 ]; Data::Rlist::write_conf('db.conf', { precision => 2 });
yields
SERVER = hostname DATABASE = database_name LOGIN = "user,password" "MAGIC VALUE" = 3.14
- read_string(INPUT)
-
Calls "read"() to parse Rlist language productions from the string or string-reference INPUT. INPUT may be an object-reference, in which case read_string() attempts to parse the string-reference defined by the -input attribute.
errors(), broken() and missing_input()
- errors([SELF])
-
Returns the number of syntax errors that occurred in the last call to "parse"(). When called as method (under SELF) returns the number of syntax errors that occured the last time SELF had called "read"().
- broken([SELF])
-
Returns the number of times the last "compile"() crossed the zenith of $Data::Rlist::MaxDepth. When called as method returns the information for the last time SELF had called "read"().
- missing_input([SELF])
-
Returns true when the last call to "parse"() yielded undef because there was nothing to parse. (This means parse() hadn't returned undef because of syntax errors.) When called as method returns the information for the last time SELF had called "read"().
write(), write_csv() and write_string()
- write(DATA[, OUTPUT, OPTIONS, HEADER])
-
Transliterates Perl data into Rlist text. write() is auto-exported as "WriteData"().
PARAMETERS
- DATA
-
Either an object generated by "new"(), or any Perl data including undef. When DATA is some Data::Rlist object the Perl data to be compiled is defined by its -data attribute. (When -data refers to another Rlist object, this other object is invoked.)
- OUTPUT
-
Where to compile to. Defaults to the -output attribute when DATA defines an object. Defines a filename to create, or some string-reference. When undef writes to some new string to which it returns a reference
- OPTIONS
-
How to compile the text from DATA. Defaults to the -options attribute when DATA is an object. When undef or
"fast"
uses "compile_fast"(), when"perl"
uses "compile_Perl"(), otherwise "compile"(). - HEADER
-
Reference to an array of strings that shall be printed literally at the top of an output file. Defaults to the -header attribute when DATA is an object.
RESULT
When write() creates a file it returns 0 for failure or 1 for success. Otherwise it returns a string reference.
EXAMPLES
$self = new Data::Rlist(-data => $thing, -output => $output); $self->write; # Write into some file (if $output is a filename) # or string (if $output is a string reference). new Data::Rlist(-data => $self)->write; # Another way to do it :-) Data::Rlist::write($thing, $output); # dto., applying the functional interface print $self->make_string; # Print $thing to stdout. print Data::Rlist::make_string($thing); # dto. PrintData($thing); # dto.
- write_csv(DATA[, OUTPUT, OPTIONS, COLUMNS, HEADER])
- write_conf(DATA[, OUTPUT, OPTIONS, HEADER])
-
Write DATA as comma-separated-values (CSV) to file or string OUTPUT. write_conf() writes configuration files where each line contains a tagname, a separator and a value. The main difference between write_conf() and write_csv() are the default values for
"separator"
and"auto_quote"
(see OPTIONS below).PARAMETERS
- DATA, OUTPUT
-
Please see "write"(). Like with write() DATA defines the data to be compiled. But because of the limitations of CSV-files this may not be just any Perl data. It must be a reference to an array of array references, where each contained array defines the fields. For example,
[ [ a, b, c ], # line 1 [ d, e, f, g ], # line 2 . . ]
Likewise, write_conf() expects
[ [ tag, value ], # line 1 . . ]
- OPTIONS
-
From OPTIONS is read the comma-separator (
"separator"
), how to quote ("auto_quote"
), the linefeed ("eol_space"
) and the numeric precision ("precision"
). The defaults are:FUNCTION SEPARATOR AUTO-QUOTING -------- --------- ------------ write_csv() ',' no write_conf() ' = ' yes
When OPTIONS is omitted, in an object context this argument is read from the -options attribute.
- COLUMNS
-
If specified this shall be an array-ref defining the column names to be written as the first line. When this parameter is omitted, in an object context this argument is read from the -columns attribute.
- HEADER
-
If specified all strings in this array are written as #-comments before the actual data. When this parameter is omitted, in an object context this argument is read from the -header attribute.
RESULT
When a file was created both function return 0 for failure or 1 for success. Otherwise they return a string reference.
EXAMPLES
Functional interface:
use Data::Rlist; # imports WriteCSV WriteCSV($thing, "foo.dat"); WriteCSV($thing, "foo.dat", { separator => '; ' }, [qw/GBKNR VBKNR EL LaD LaD_V/]); WriteCSV($thing, \$target_string); $string_ref = WriteCSV($thing);
Object-oriented interface:
$object = new Data::Rlist(-data => $thing, -output => "foo.dat", -options => { separator => '; ' }, -columns => [qw/GBKNR VBKNR EL LaD LaD_V/]); $object->write_csv; # Write $thing as CSV to foo.dat $object->write; # Write $thing as Rlist to foo.dat $object->set(-output => \$target_string); $object->write_csv; # Write $thing as CSV to $target_string
Please see read_csv() for more examples.
- write_string(DATA[, OPTIONS])
-
Stringify any Perl DATA and return a reference to the string.
Like "write"() but always compiles to a new string to which it returns a reference. This means, when called as method and unlike "write"() this function does not use the -output attribute. Also it does not use -options; when OPTIONS are omitted they default to
"string"
.
make_string() and keelhaul()
- make_string(DATA[, OPTIONS])
-
Stringify any Perl DATA and return the string. This function actually is an alias for ${Data::Rlist::write_string(DATA, OPTIONS)}. Note, however, that OPTIONS default to
"default"
, not"string"
. For example,print "\n\$thing dumped: ", Data::Rlist::make_string($thing); $self = new Data::Rlist(-data => $thing); print "\n\$thing dumped (again): ", $self->make_string;
- keelhaul(DATA[, OPTIONS])
-
Do a deep copy of DATA according to OPTIONS. DATA is any Perl data, or some Data::Rlist object. keelhaul() first compiles DATA to Rlist text, then restores the data from this text. Hence by "keelhauling data" one can adjust the accuracy of numbers, break circular-references and drop \*foo{THING}s.
This is especially useful when DATA had been hatched by some other code, and you don't know whether it is hierachical, or if typeglob-refs nist inside. You may then simply keelhaul it to clean it from its past. Also multiple data sets can be brought to the same, common basis.
For example, to brings all numbers in
$thing = { foo => [[.00057260], -1.6804e-4] };
to a certain accuracy, use
$deep_copy = Data::Rlist::keelhaul($thing, { precision => 4 });
to get a $deep_copy (of $thing) as
{ foo => [[0.0006], -0.0002] }
All number scalars were rounded to 4 decimal places, so they're finally comparable as floating-point numbers. Likewise one can convert all floats to integers:
$make_integers = new Data::Rlist(-data => $thing, -options => { precision => 0 }); $thing_without_floats = $make_integers->keelhaul;
When keelhaul() is called in an array context it also returns the text from which the copy had been built. For example,
$deep_copy = Data::Rlist::keelhaul($thing); ($deep_copy, $rlist_text) = Data::Rlist::keelhaul($thing); $deep_copy = new Data::Rlist(-data => $thing)->keelhaul;
You may then bet that
die if deep_compare($deep_copy, ReadData(\$rlist_text));
will never die. (It shouldn't.)
NOTES
keelhaul() won't throw die nor return an error, but be prepared for the following effects:
ARRAY, HASH, SCALAR and REF references were compiled, whether blessed or not. (Since compiling does not store type information, keelhaul() will turn blessed references into barbars again.)
IO, GLOB and FORMAT references have been converted into strings.
Depending on the compile options CODE references were called, deparsed back into their function bodies, or dropped.
Depending on the compile options floats had been rounded.
undef'd array elements had been converted into the default scalar value
""
.Anything deeper than $Data::Rlist::MaxDepth had been thrown away.
Yet no special methods are triggered to "freeze" and "thaw" an object is called before compiling it into text, or after parsing it from text.
See also "compile"(), "equal()" and "deep_compare"()
Static Interface
predefined_options() and complete_options()
- predefined_options([PREDEF-NAME])
-
Get the hash-ref $Data::Rlist::PredefinedOptions{PREDEF-NAME}. PREDEF-NAME defaults to
"default"
, the options for writing files. - complete_options([OPTIONS[, BASIC-OPTIONS]])
-
Completes OPTIONS with BASIC-OPTIONS: all pairs not already in OPTIONS are copied from BASIC-OPTIONS. Both arguments define hashes or some predefined options name, and default to
"default"
, the options for writing files.This function returns a new hash of compile options. (Even when OPTIONS defines a hash it is copied into a new one.) For example,
$options = complete_options({ precision => 0 }, 'squeezed')
merges the predefined options for
"squeezed"
text (no whitespace at all, no here-docs, numbers rounded) with a numeric precision of 0. This converts all floats to integers.$options = complete_options($them, { delimiter => '\s+' })
completes $them by some other hash (that is, copies
"delimiter"
unless such a key exists in $them). However, $them is not touched.
Implementation
open_input() and close_input()
- open_input(INPUT[, FILTER, FILTER-ARGS])
- close_input()
-
Open/close Rlist text file or string INPUT for parsing. Used internally by "read"() and "read_csv"().
PREPROCESSING
If specified the function preprocesses the INPUT file using FILTER. Use the special value 1 to select the default C preprocessor (precisely, gcc -E -Wp,-C). FILTER-ARGS is an optional string of additional command-line arguments appended to FILTER. For example,
my $foo = Data::Rlist::read("foo", 1, "-DEXTRA")
eventually does not parse foo, but the output of the command
gcc -E -Wp,-C -DEXTRA foo
Hence within foo C-preprocessor-statements are allowed:
{ #ifdef EXTRA #include "extra.rlist" #endif 123 = (1, 2, 3); foobar = { . .
SAFE CPP MODE
This slightly esoteric mode involves sed and a temporary file. It is enabled by setting $Data::Rlist::SafeCppMode to 1 (the default). It protects single-line #-comments when FILTER begins with either gcc, g++ or cpp. "open_input"() then additionally runs sed to convert all input lines beginning with whitespace plus the # character. Only the following cpp-commands are excluded, and only when they appear in column 1:
- #include and #pragma
- #define and #undef
- #if, #ifdef, #else and #endif.
For all other lines sed converts # into ##. This prevents the C preprocessor from evaluating them. But because of Perl's limited open() function, which isn't able to dissolve arbitary pipes, the invocation of sed requires a temporary file (created in the same directory as the input file). "lexln"(), the function that feeds the lexical scanner with lines, then converts ## back into comment lines.
Alternately, use // and /* */ comments and set $Data::Rlist::SafeCppMode to 0.
lex() and parse()
- lex()
-
Lexical scanner. Called by "parse"() to split the current line into tokens. lex() reads # or // single-line-comment and /* */ multi-line-comment as regular white-spaces. Otherwise it returns tokens according to the following table:
RESULT MEANING ------ ------- '{' '}' Punctuation '(' ')' Punctuation ',' Operator ';' Punctuation '=' Operator 'v' Constant value as number, string, list or hash '??' Error undef EOF
lex() appends all here-doc-lines with a newline character. For example,
<<test1 a b test1
is effectively read as
"a\nb\n"
, which is the same value as the equivalent here-doc in Perl has. Hence the purpose of the last character (the newline in the last line) is not just to separate the last line from the delimiter. As a consequence, not all strings can be encoded as a here-doc. For example, it might not be quite obvious to many programmers that"foo\nbar"
has no here-doc-equivalent. - lexln()
-
Read the next line of text from the input. Return 0 if "at_eof"(), 1 otherwise.
- at_eof()
-
Return true if current input file / string array is exhausted, false otherwise.
- parse()
-
Read Rlist language productions from current input, defined by package variables. This is a fast, non-recursive parser driven by the parser map %Data::Rlist::Rules, and fed by "lex"().
parse() is called internally by "read"().
compile(), compile_fast() and compile_Perl()
- compile(DATA[, OPTIONS, FH])
-
Build Rlist text from any Perl data DATA. When FH is defined compile directly to this file and return 1. Otherwise (FH is undef) build a string and return a reference to it.
HOW DATA IS COMPILED
Reference-types SCALAR, HASH, ARRAY and REF are compiled into text, whether blessed or not.
Reference-types CODE are compiled depending on the
"code_refs"
setting in OPTIONS.Reference-types GLOB (typeglob-refs), IO and FORMAT (file- and directory handles) cannot be dissolved. These are compiled into the strings
"?GLOB?"
,"?IO?"
and"?FORMAT?"
.undef'd values in arrays are compiled into the default Rlist
""
.
- compile_fast(DATA)
-
Build Rlist text from any Perl data DATA. Do this as fast as actually possible with pure Perl.
HOW DATA IS COMPILED
Reference-types SCALAR, HASH, ARRAY and REF are compiled into text, whether blessed or not.
CODE, GLOB, IO and FORMAT are compiled into the strings
"?CODE?"
,"?IO?"
,"?GLOB?"
and"?FORMAT?"
.undef'd values in arrays are compiled into the default Rlist
""
.
The main difference to "compile"() is that compile_fast() considers no compile options. Thus it cannot call code, implicitly round numbers, and cannot detect recursively-defined data.
compile_fast() returns a reference to the compiled string, which is a reference to a unique package variable. Subsequent calls to compile_fast() therefore reassign this variable.
- compile_Perl(DATA)
-
Like compile_fast(), but do not compile Rlist text - compile DATA into Perl. It can then be eval'd. This renders more compact, and more exact output as Data::Dumper. For example, only strings are quoted.
Use the compile-option
"perl"
to trigger this function from "write"() and write_string().
Auxiliary Functions
The utility functions in this section are generally useful when handling stringified data. These functions are either very fast, or smart, or both. For example, "quote"(), "unquote"(), "escape"() and "unescape"() internally use precompiled regexes and precomputed ASCII tables; so employing these functions is probably faster then using own variants.
is_number(), is_symbol() and is_random_text()
- is_integer(SCALAR-REF)
-
Returns true when a scalar looks like an +/- integer constant. The function applies the compiled regex $Data::Rlist::g_re_integer.
- is_number(SCALAR-REF)
-
Test for strings that look like numbers. is_number() can be used to test whether a scalar looks like a integer/float constant (numeric literal). The function applies the compiled regex $Data::Rlist::g_re_float. Note that it doesn't match
- the IEEE 754 notations of Infinite and NaN,
- leading or trailing whitespace,
- lexical conventions such as the
"0b"
(binary),"0"
(octal),"0x"
(hex) prefix to denote a number-base other than decimal, and- Perls' legible numbers, e.g. 3.14_15_92.
See also
perldoc -q "whether a scalar is a number"
- is_symbol(SCALAR-REF)
-
Test for symbolic names. is_symbol() can be used to test whether a scalar looks like a symbolic name. Such strings need not to be quoted. Rlist defines symbolic names as a superset of C identifier names:
[a-zA-Z_0-9] # C/C++ character set for identifiers [a-zA-Z_0-9\-/\~:\.@] # Rlist character set for symbolic names [a-zA-Z_][a-zA-Z_0-9]* # match C/C++ identifier [a-zA-Z_\-/\~:@][a-zA-Z_0-9\-/\~:\.@]* # match Rlist symbolic name
For example, scoped/structured names such as std::foo, msg.warnings, --verbose, calculation-info need not be quoted. (But if they're quoted their value is exactly the same.) Note that is_symbol() does not catch leading or trailing whitespace. Another restriction is that
"."
cannot be used as first character, since it could also begin a number. - is_value(SCALAR-REF)
-
Returns true when the scalar is an integer, a number, a symbolic name or some string returned by "quote"().
- is_random_text(SCALAR-REF)
-
The opposite of is_value(). On such text "compile"() amd "compile_fast"() call "quote"().
quote(), escape() and unhere()
- quote(TEXT)
- escape(TEXT)
-
Converts TEXT into 7-bit-ASCII. All characters not in the set of the 95 printable ASCII characters are escaped (see below). The following ASCII codes will be converted to escaped octal numbers, i.e. 3 digits prefixed by a slash:
0x00 to 0x1F 0x80 to 0xFF " ' \
The difference between the two functions is that quote() additionally places TEXT into double-quotes. For example, quote(qq'"Früher Mittag\n"') returns
"\"Fr\374her Mittag\n\""
, while escape() returns\"Fr\374her Mittag\n\"
- maybe_quote(TEXT)
-
Return quote(TEXT) if "is_random_text"(TEXT); otherwise (TEXT defines a symbolic name or number) return TEXT.
- maybe_unquote(TEXT)
-
Return unquote(TEXT) when the first character of TEXT is
"
; otherwise returns TEXT. - unquote(TEXT)
- unescape(TEXT)
- unhere(HERE-DOC-STRING[, COLUMNS, FIRSTTAB, DEFAULTTAB])
-
HERE-DOC-STRING shall be a here-document. The function checks whether each line begins with a common prefix, and if so, strips that off. If no prefix it takes the amount of leading whitespace found the first line and removes that much off each subsequent line.
Unless COLUMNS is defined returns the new here-doc-string. Otherwise, takes the string and reformats it into a paragraph having no line more than COLUMNS characters long. FIRSTTAB will be the indent for the first line, DEFAULTTAB the indent for every subsequent line. Unless passed, FIRSTTAB and DEFAULTTAB default to the empty string
""
.This function combines recipes 1.11 and 1.12 from the Perl Cookbook.
split_quoted()
- split_quoted(INPUT[, DELIMITER])
- parse_quoted(INPUT[, DELIMITER])
-
Divide the string INPUT into a list of strings. DELIMITER is a regular expression specifying where to split (default:
'\s+'
). The function won't split at DELIMITERs inside quotes, or which are backslashed. For example, to split INPUT at commas use'\s*,\s*'
.parse_quoted() works like split_quoted() but additionally removes all quotes and backslashes from the splitted fields. Both functions effectively simplify the interface of Text::ParseWords. In an array context they return a list of substrings, otherwise the count of substrings. An empty array is returned in case of unbalanced
"
quotes, e.g. split_quoted('foo,"bar'
).EXAMPLES
split_quoted():
sub split_and_list($) { print ($i++, " '$_'\n") foreach split_quoted(shift) } split_and_list(q("fee foo" bar)) 0 '"fee foo"' 1 'bar' split_and_list(q("fee foo"\ bar)) 0 '"fee foo"\ bar'
The default DELIMITER
'\s+'
handles newlines. split_quoted("foo\nbar\n"
) returns ('foo', 'bar', '') and hence can be used to to split a large string of uncho(m)p'd input lines into words:split_and_list("foo \r\n bar\n") 0 'foo' 1 'bar' 2 ''
The DELIMITER matches everywhere outside of quoted constructs, so in case of the default
'\s+'
you may want to remove heading/trailing whitespace. Considersplit_and_list("\nfoo") split_and_list("\tfoo") 0 '' 1 'foo'
and
split_and_list(" foo ") 0 '' 1 'foo' 2 ''
parse_quoted() additionally removes all quotes and backslashes from the splitted fields:
sub parse_and_list($) { print ($i++, " '$_'\n") foreach parse_quoted(shift) } parse_and_list(q("fee foo" bar)) 0 'fee foo' 1 'bar' parse_and_list(q("fee foo"\ bar)) 0 'fee foo bar'
MORE EXAMPLES
String
'field\ one "field\ two"'
:('field\ one', '"field\ two"') # split_quoted ('field one', 'field two') # parse_quoted
String
'field\,one, field", two"'
with a DELIMITER of'\s*,\s*'
:('field\,one', 'field", two"') # split_quoted ('field,one', 'field, two') # parse_quoted
Split a large string $soup (mnemonic: slurped from a file) into lines, at LF or CR+LF:
@lines = split_quoted($soup, '\r*\n');
Then transform all @lines by correctly splitting each line into "naked" values:
@table = map { [ parse_quoted($_, '\s*,\s') ] } @lines
Here is some more complete code to parse a .csv-file with quoted fields, escaped commas:
open my $fh, "foo.csv" or die $!; local $/; # enable localized slurp mode my $content = <$fh>; # slurp whole file at once close $fh; my @lines = split_quoted($content, '\r*\n'); die q(unbalanced " in input) unless @lines; my @table = map { [ map { parse_quoted($_, '\s*,\s') } ] } @lines
You may also use "read_csv"(). A nice way to make sure what split_quoted() and parse_quoted() return is using deep_compare(). For example, the following code shall never die:
croak if deep_compare([split_quoted("fee fie foo")], ['fee', 'fie', 'foo']); croak if deep_compare( parse_quoted('"fee fie foo"'), 1);
The 2nd call to "parse_quoted"() happens in scalar context, hence shall return 1 because there's one string to parse.
equal() and round()
- equal(NUM1, NUM2[, PRECISION])
- round(NUM1[, PRECISION])
-
Compare and round floating-point numbers. "equal"() returns true if NUM1 and NUM2 are equal to PRECISION (default: 6) number of decimal places. NUM1 and NUM2 are string- or number scalars.
Normally round() will return a number in fixed-point notation. When the package-global $Data::Rlist::RoundScientific is true round() formats the number in either normal or exponential (scientific) notation, whichever is more appropriate for its magnitude. This differs slightly from fixed-point notation in that insignificant zeroes to the right of the decimal point are not included. Also, the decimal point is not included on whole numbers. For example, "round"(42) does not return 42.000000, and round(0.12) returns 0.12, not 0.120000.
MACHINE ACCURACY
One needs a function like "equal"() to compare floats, because IEEE 754 single- and double precision implementations are not absolute - in contrast to the numbers they actually represent. In all machines non-integer numbers are only an approximation to the numeric truth. In other words, they're not commutative! For example, given two floats a and b, the result of a+b might be different than that of b+a.
Each machine has its own accuracy, called the machine epsilon, which is the difference between 1 and the smallest exactly representable number greater than one. Most of the time only floats can be compared that have been carried out to a certain number of decimal places. In general this is the case when two floats that result from a numeric operation are compared - but not two constants. (Constants are accurate through to lexical conventions of the language. The Perl and C syntaxes for numbers simply won't allow you to write down inaccurate numbers in code.)
See also recipes 2.2 and 2.3 in the Perl Cookbook.
EXAMPLES
CALL RETURNS NUMBER ---- -------------- round('0.9957', 3) 0.996 round(42, 2) 42 round(0.12) 0.120000 round(0.99, 2) 0.99 round(0.991, 2) 0.99 round(0.99, 1) 1.0 round(1.096, 2) 1.10 round(+.99950678) 0.999510 round(-.00057260) -0.000573 round(-1.6804e-6) -0.000002
deep_compare()
- deep_compare(A, B[, PRECISION, PRINT])
-
Compare and analyze two numbers, strings or references. Generates a log (stack of messages) describing exactly all unequal data. Hence, for any Perl data $a and $b one can assert:
croak "$a differs from $b" if deep_compare($a, $b);
When PRECISION is defined all numbers in A and B are "round"()ed before actually comparing them. When PRINT is true traces progress on stdout.
RESULT
Returns an array of messages, each describing unequal data, or data that cannot be compared because of type- or value-mismatching. The array is empty when deep comparison of A and B found no unequal numbers or strings, and only indifferent types.
EXAMPLES
The result is line-oriented, and for each mismatch it returns a single message:
Data::Rlist::deep_compare(undef, 1)
yields
<<undef>> cmp <<1>> stop! 1st undefined, 2nd defined (1)
Some more complex example. Deep-comparing two multi-level data structures A and B returned two messages:
'String literal' == REF(0x7f224) stop! type-mismatch (scalar versus REF) 'Greetings, earthlings!' == CODE(0x7f2fc) stop! type-mismatch (scalar versus CODE)
Somewhere in A a string
"String literal"
could not be compared, because the corresponding element in B is a reference to a reference. Next it says that"Greetings, earthlings!"
could not be compared because the corresponding element in B is a code reference. (One could assert, however, that the actual opacity here is that they speak ASCII.)Actually, A and B are identical. B was written to disk (by "write"())and then read back as A (by "read"()). So, why don't they compare anymore? Because in B the refs REF(0x7f224) and CODE(0x7f2fc) hide
\"String literal"
and
sub { 'Greetings, earthlings!' }
When writing B to disk write() has dissolved the scalar- and the code-reference into
"String literal"
and"Greetings, earthlings!"
. Of course, deep_compare() will not do that, so A does not compare to B anymore. Note that despite these two mismatches, deep_compare() had continued the comparision for all other elements in A and B. Hence the structures are otherwise identical.
fork_and_wait() and synthesize_pathname()
- fork_and_wait(PROGRAM[, ARGS...])
-
Forks a process and waits for completion. The function will extract the exit-code, test whether the process died and prints status messages on stderr. fork_and_wait() hence is a handy wrapper around the built-in system() and exec() functions. Returns an array of three values:
($exit_code, $failed, $coredump)
$exit_code is -1 when the program failed to execute (e.g. it wasn't found or the current user has insufficient rights). Otherwise $exit_code is between 0 and 255. When the program died on receipt of a signal (like SIGINT or SIGQUIT) then $signal stores it. When $coredump is true the program died and a core file was written. (Note that some systems store cores somewhere else than in the programs' working directory.)
- synthesize_pathname(TEXT...)
-
Concatenates and forms all TEXT strings into a symbolic name that can be used as a pathname. synthesize_pathname() is a useful function to reuse a string, assembled from multiple strings, coinstantaneously as hash key, database name, and file- or URL name. Note, however, that few characters are mapped to only
"_"
and"-"
.
Exported Functions
Exporter Tags
Three tags are available that import function sets. These are utility functions usable also separately from Data::Rlist.
- :floats
-
Imports "equal"(), "round"() and "is_number"().
- :strings
-
Imports "maybe_quote"(), "quote"(), "escape"(), "unquote"(), "unescape"(), "unhere"(), "is_random_text"(), "is_number"(), "is_symbol"(), "split_quoted"(), and "parse_quoted"().
- :options
-
Imports "predefined_options"() and "complete_options"().
- :aux
-
Imports "deep_compare"(), "fork_and_wait"() and "synthesize_pathname"().
For example,
use Data::Rlist qw/:floats :strings/;
Auto-Exported Functions
The following functions are implicitly imported into the callers symbol table. (But you may say require Data::Rlist instead of use Data::Rlist to prohibit auto-import. See also perlmod.)
- ReadData(INPUT[, FILTER, FILTER-ARGS])
- ReadCSV(INPUT[, OPTIONS, FILTER, FILTER-ARGS])
- ReadConf(INPUT[, OPTIONS, FILTER, FILTER-ARGS])
-
Another way to call Data::Rlist::"read"(), Data::Rlist::"read_csv"() and Data::Rlist::"read_conf"().
- WriteData(DATA[, OUTPUT, OPTIONS, HEADER])
- WriteCSV(DATA[, OUTPUT, OPTIONS, COLUMNS, HEADER])
- WriteConf(DATA[, OUTPUT, OPTIONS, HEADER])
-
Another way to call Data::Rlist::"write"(), Data::Rlist::"write_csv"() and Data::Rlist::"write_conf"().
- OutlineData(DATA[, OPTIONS])
- StringizeData(DATA[, OPTIONS])
- SqueezeData(DATA[, OPTIONS])
-
Another way to call Data::Rlist::"make_string"(). OutlineData() applies the predefined
"outlined"
options, while StringizeData() applies"string"
and SqueezeData()"squeezed"
. When specified, OPTIONS are merged into the predefined set by means of complete_options(). For example,print "\n\$thing: ", OutlineData($thing, { precision => 12 });
rounds() all numbers in $thing to 12 digits.
- PrintData(DATA[, OPTIONS])
-
Another way to say
print OutlineData(DATA, OPTIONS);
For example,
print OutlineData($thing);
- KeelhaulData(DATA[, OPTIONS])
- CompareData(A, B[, PRECISION, PRINT_TO_STDOUT])
-
Another way to call "keelhaul"() and "deep_compare"(). For example,
use Data::Rlist; . . my($copy, $as_text) = KeelhaulData($thing);
NOTES
The Random Lists (Rlist) syntax is inspired by NeXTSTEP's Property Lists. Rlist is simpler, more readable and more portable. The Perl, C and C++ implementations are fast, stable and free. Markus Felten, with whom I worked a few month in a project at Deutsche Bank, Frankfurt in summer 1998, arrested my attention on Property lists. He had implemented a Perl variant of it (http://search.cpan.org/search?dist=Data-PropertyList).
The term "Random" underlines the fact that the language
has four primitive/anonymuous types;
the basic building block is a list, which is combined at random with other lists.
Hence the term Random does not mean aimless or accidental. Random Lists are arbitrary lists.
Rlist vs. Perl Syntax
Rlists are not Perl syntax:
RLIST PERL
----- ----
5; { 5 => undef }
"5"; { "5" => undef }
5=1; { 5 => 1 }
{5=1;} { 5 => 1 }
(5) [ 5 ]
{} { }
; { }
() [ ]
Speeding up Compilation (Explicit Quoting)
Much work has been spent to optimize Data::Rlist for speed. Still it is implemented in pure Perl (no XS). A very rough estimate for Perl 5.8 is "each MB takes one second per GHz". For example, when the resulting Rlist file has a size of 13 MB, compiling it from a Perl script on a 3-GHz-PC requires about 5-7 seconds. Compiling the same data under Solaris, on a sparcv9 processor operating at 750 MHz, takes about 18-22 seconds.
The process of compiling can be speed up by calling "quote"() explicitly on scalars. That is, before calling "write"() or "write_string"(). Large data sets may compile faster when for scalars, that certainly not qualify as symbolic name, "quote"() is called in advance:
use Data::Rlist qw/:strings/;
$data{quote($key)} = $value;
.
.
Data::Rlist::write("data.rlist", \%data);
instead of
$data{$key} = $value;
.
.
Data::Rlist::write("data.rlist", \%data);
It depends on the case whether the first variant is faster: "compile"() and "compile_fast"() both have to call "is_random_text"() on each scalar. When the scalar is already quoted, i.e. its first character is "
, this test ought to run faster.
Note that internally "is_random_text"() applies the precompiled regex $g_re_value. But for a given scalar $s the expression ($s !~ $Data::Rlist::g_re_value) can be up to 20% faster than the equivalent is_random_text($s).
Quoting strings that look like numbers
Normally you don't have to care about strings, since un/quoting happens as required when reading/compiling Rlists from Perl data. A common problem, however, occurs when some text fragment (string) uses the same lexicography than numbers do.
Printed text uses well-defined glyphs and typographic conventions, and finally the competence of the reader to recognize numbers. But computers need to know the exact number type and format. Integer? Float? Hexadecimal? Scientific? Klingon? The Perl Cookbook in recipe 2.1 recommends the use of a regular expression to distinguish number from string scalars. The advice illustrates how hard the problem actually is. Not only Perl has to come over this; any program that interprets text has to.
Since Perl scripts are texts that process text into more text, Perl's artful answer was to define typeless scalars. Scalars hold a number, a string or a reference. Therewith Perl solves the problem that digits, like alphabetics and punctuations, are regular ASCII codes. So Perl defines the string as the basic building block for all program data. Venturesome it then lets the program decide what strings mean. Analogical, in a printed book the reader has to decipher the glyphs and decide what evidence they hide.
In Rlist, string scalars that look like numbers need to be quoted explicitly. Otherwise, for example, the scalar $s="-3.14"
appears as -3.14 in the output. Likewise "007324"
is compiled into 7324 - the text quality is lost and the scalar is read back as a number. Of course, this behavior is by intend, and in most cases this is just what you want. For hash keys, however, it might be a problem. One solution is to prefix the string by an artificial "_"
:
my $s = '-9'; $s = "_$s";
Since the scalar begins with a "_"
it does not qualify as a number anymore, and hence is compiled as string, and read back as string. In the C++ implementation it will then become some std::string, not a double. But the leading "_"
has to be removed by the reading program. Perhaps a better solution is to explicitly call Data::Rlist::quote:
$k = -9;
$k = Data::Rlist::quote($k); # returns qq'"-9"'
use Data::Rlist qw/:strings/;
$k = 3.14_15_92;
$k = quote($k); # returns qq'"3.141592"'
Again, the need to quote strings that look like numbers is a problem evident only in the Perl implementation of Rlist, since Perl is a language with weak types. As a language with very strong typing C++ is quasi the antipode to Perl. With the C++ implementation of Rlist then there's no need to quote strings that look like numbers.
See also "write"(), "is_number"(), "is_symbol"(), "is_random_text"() and http://en.wikipedia.org/wiki/American_Standard_Code_for_Information_Interchange.
Installing Rlist.pm locally
Installing CPAN packages usually requires administrator privileges. In case you don't have them, another way is to copy the Rlist.pm file into a directory of your choice, e.g. into . or ~/bin. Instead of use Data::Rlist;, however, you then use the following code:
BEGIN {
$0 =~ /[^\/]+$/;
push @INC, $`||'.', "$ENV{HOME}/bin";
require Rlist;
Data::Rlist->import();
Data::Rlist->import(qw/:floats :strings/);
}
This code finds Rlist.pm also in . and ~/bin, and then calls the Exporter manually.
Package Dependencies
Data::Rlist depends only on few other packages:
Exporter
Carp
strict
integer
Sys::Hostname
Scalar::Util # deep_compare() only
Text::Wrap # unhere() only
Text::ParseWords # split_quoted(), parse_quoted() only
Data::Rlist is free of $&, $` or $'. Reason: once Perl sees that you need one of these meta-variables anywhere in the program, it has to provide them for every pattern match. This may substantially slow your program (see also perlre).
Background: A Short Story of Typeglobs
This is supplement information for "compile"().
Typeglobs are an idiosyncracy of Perl. Perl uses a symbol table per package (namespace) to map symbolic names like foo to Perl values. Humans use abstract symbols to name things, because we can remember symbols better than numbers, or formulas that hide numbers.
Typeglob objects are symbol table entries.
The idiosyncracy is that different types need only one entry - one symbol can name all types of Perl data (scalars, arrays, hashes) and nondata (functions, formats, I/O handles). For example, the symbol foo is mapped to the typeglob *foo. Therein coexist $foo (the scalar value), @foo (the list value), %foo (the hash value), &foo (the code value) and foo (the I/O handle or the format specifier). There's no key "$foo"
or "@foo"
in the symbol table, only "foo"
.
The symbol table is an ordinary hash, named like the package with two colons appended. The main symbol table's name is thus %main::, or %::. Internally this is called a stash (for symbol table hash). In the C code that implements Perl, %:: is the global variable defstash (default stash). It holds items in the main package. But, as if it were a symbol in a stash, perl arranges it as typeglob-ref:
$ perl -e 'print \*::'
GLOB(0x10010f08)
But the root-stash defstash lists stashes from all other packages. For example, the symbol Data:: in stash %:: addresses the stash of package Data, and the symbol Rlist:: in the stash %Data:: addresses the stash of package Data::Rlist.
Stashes are symbol tables. perl has one stash per package.
All \*names:: are actually stash-refs, but Perl calls them globs.
Like all hashes stashes contain string keys, which name symbols, and values which are typeglobs. In the C implementation of Perl typeglobs have the struct type GV, for Glob value. In the stashes, typeglobs are GV pointers.
The typeglob is interposed between the stash and the program's actual values for $foo, @foo etc.
The sigil * serves as wildcard for the other sigils %, @, $ and &. A sigil is a symbol created for a specific magical purpose; the name derives from the latin sigilum = seal.
Modifying $foo in a Perl program won't change %foo. Each typeglob is merely a set of pointers to separate objects describing scalars, arrays, hashes, functions, formats and I/O handles. Normally only one pointer *foo is non-null. Because typeglobs host pointers, *foo{ARRAY} is a way to say \@foo. To get a reference to the typeglob for symbol *foo you say *foo{GLOB}, or \*foo. But on the other hand it is not quite clear why
$ perl -e 'exists *foo{GLOB}'
exists argument is not a HASH or ARRAY element at -e line 1.
To define the scalar pointer in the typeglob *foo you simply say $foo = 42. But you may also assign a reference to the typeglob:
$ perl -e '$x = 42; *foo = \$x; print $foo'
42
Assigning a scalar alters the symbol, not the typeglob:
$ perl -e '$x = 42; *foo = $x; print *foo'
*main::42
$ perl -e '$x = 42; *foo = $x; print *42'
*main::42
Hmm.
$ perl -e 'print 1*9'
9
$ perl -e 'print *9'
*main::9
I wish it wouldn't do that.
$ perl -e '*foo = 42; print $::{42}, *foo'
*main::42*main::42
Enough, this is very strange.
Maybe the best use of typeglobs are Typeglob-aliases. For example, *bar = *foo aliases the symbol bar in the stash. Then the symbols foo and bar point to the same typeglob! This means that when you declare sub foo {} after casting the alias, bar() is foo(). The penalty, however, is that the bar symbol cannot be easily removed from the stash. One way is to say local *bar, wich temporarily assigns a new typeglob to bar with all pointers zeroized.
What is this good for? This is not quite clear. Obviously it is just an artefact from Perl 4. In fact, local typeglob aliases seem to be faster than references, because no dereferencing is required. For example,
void f1 { my $bar = shift; ++$$bar }
void f2 { local *bar = shift; ++$bar }
f1(\$foo); # increments $foo
f1(*foo); # dto., but faster
Note, however, that my variables (lexical variables) are not stored in stashes, and do not use typeglobs. These variables are stored in a special array, the scratchpad, assigned to each block, subroutine, and thread. These are real private variables, and they cannot be localized. Each lexical variable occupies a slot in the scratchpad; hence is addressed by an integer index, not a symbol. my variables are like auto variables in C. They're also faster than locals, because they're allocated at compile time, not runtime. Therefore you cannot declare *foo lexically:
$ perl -e 'my(*foo);'
Can't declare ref-to-glob cast in "my" at -e line 1, near ");"
Execution of -e aborted due to compilation errors.
Also it is somewhat confusing that $foo and @foo etc. have concrete values, while *foo is said to be *main::foo:
$ perl -e 'print *foo'
*main::foo
$ perl -e 'package nirvana; use strict; print *foo;'
*nirvana::foo
Hence the value of a typeglob is a full path into the perl stashes, down from the defstash. The stash entry is arranged by perl on the fly, even with the use strict pragma in effect. One needs to get used to the fact that *foo returns a symbol path, not something like
(SCALAR => \$foo, ARRAY => \@foo)
for all its non-null pointers (in this example, the symbol foo would have had incarnated as $foo and @foo).
Conclusion: with typeglobs you reach the bedrock of Perl, where the spade bends back.
See also perlguts, perlref, perldsc and perllol.
BUGS
There are no known bugs, this package is stable.
Deficiencies of this version:
nanoscripts not yet implemented.
The
"deparse"
functionality for the"code_refs"
compile option has not yet been implemented.The
"threads"
compile option has not yet been implemented.IEEE 754 notations of Infinite and NaN not yet implemented.
compile_Perl() is experimental.
SEE ALSO
Data::Dumper
In contrast to the Data::Dumper, Data::Rlist scalars will be properly typed as number or string. Data::Dumper writes numbers always as quoted strings, for example
$VAR1 = {
'configuration' => {
'verbose' => 'Y',
'importance_sampling_loss_quantile' => '0.04',
'distribution_loss_unit' => '100',
'default_only' => 'Y',
'num_threads' => '5',
.
.
}
};
where Data::Rlist writes
{
configuration = {
verbose = Y;
importance_sampling_loss_quantile = 0.04;
distribution_loss_unit = 100;
default_only = Y;
num_threads = 5;
.
.
}
}
As one can see Data::Dumper writes the data right in Perl syntax, which means the dumped text can be simply eval'd. This means data can be restored very fast. Rlists are not quite Perl-syntax: a dedicated parser is required. But therefore Rlist text is portable and can be read from other programming languages, namely C++, where a fast flex/bison-parser in conjunction with a smart heap management is implemented. So C++ programs, like Perl programs, are able to handle Rlist files of several hundred MB.
With $Data::Dumper::Useqq enabled it was observed that Data::Dumper renders output significantly slower than "compile"(). This is actually suprising, since Data::Rlist tests for each scalar whether it is numeric, and truely quotes/escapes strings. Data::Dumper quotes all scalars (including numbers), and it does not escape strings. This may also result in some odd behaviors. For example,
use Data::Dumper;
print Dumper "foo\n";
yields
$VAR1 = 'foo
';
while
use Data::Rlist;
PrintData "foo\n"
yields
{ "foo\n"; }
Recall that "parse"() always returns a list, as array- or hash-reference.
Finally, Data::Rlist generates smaller files. With the default $Data::Dumper::Indent of 2 Data::Dumper's output is 4-5 times that of Data::Rlist's, because Data::Dumper recklessly uses many whitespaces (blanks) instead of horizontal tabulators. This blows up file sizes without measure.
COPYRIGHT/LICENSE
Copyright 1998-2007 Andreas Spindler
Maintained at CPAN (http://search.cpan.org/~aspindler) and the author's site (http://www.visualco.de). Please send mail to rlist@visualco.de.
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.
Thank you for your attention.