NAME

GraphViz::Data::Structure - Visualise data structures

SYNOPSIS

use GraphViz::Data::Structure;

my $gvds = GraphViz:Data::Structure->new($data_structure);
print $gvds->graph()->as_png;

DESCRIPTION

This module makes it easy to visualise data structures, even recursive or circular ones.

It is provided as an alternative to GraphViz::Data::Grapher. Differences:

GraphViz::Data::Grapher creates graphics of indiividual substructures (arrays, scalars, hashes) which keep the substructure type and data together; GraphViz::Data::Structure does this by shape alone.
GraphViz::Data::Structure encapsulates object info (if any) directly into the node being used to represent the class.
GraphViz::Data::Grapher colors its graphs; GraphViz::Data::Structure doesn't by default.
GraphViz::Data:Structure can parse out globs and CODE references (almost as well as the debugger does).

REPRESENTING DATA STRUCTURES AS GRAPHS

Graphviz::Data::Structure tries to draw data structure diagrams with a minimum of complexity and a maximum of elegance. To this end, the following design choices were made:

Strings, scalars, filehandles, and code references are represented as plain text.
Empty hashes and arrays are represented as Perl represents them in code: hashes as {}, and arrays as [], except if they are blessed (see below).
Arrays are laid out as sets of boxes, in the order in which they were found in the existing data structure (left-to-right or top-to-bottom, depending on overall graph layout).
Hashes are laid out as pairs of sets of boxes, with the keys in alphabetically-sorted order top-to-bottom or left-to-right.
Blessed items have a box added to them in parallel, containing the name of the class and its type (scalar/array/hash).
Code references are decoded to determine their fully-qualified package name and are output as plaintext nodes.
Glob pointed to by references are disassembled and their individual parts dumped.

ALGORITHM

The algorithm is a standard recursive depth-first treewalk; we determine how the current node should be added to the current graph, add it, and then call ourselves recursively to determine how all nodes below this one should be visualized.Edges are added after the subnodes are added to the graph.

Items "within" the current subnode (array and hash elements which are not references) are rendered inside a cell in the aggregate corresponding to their position. References are represented by an edge linking the appropriate postion in the aggregate to the appropriate subnode.

This code does its data-structure unwrapping in a manner very similar to that used by dumpvar.pl, the code used by the debugger to display data structures as text. The initial structure treewalk was written in isolation; the dumpvar.pl code was integrated only after it was recognized that there was more to life than hashes, arrays, and scalars.The dumpvar.pl code to decode globs and code references was used almost as-is.

Code was added to attempt to spot references to array or hash elements, but this code still does not work as desired. Array and hash element references still appear to be scalars to the current algorithm.

GLOBAL SETTINGS

GraphViz::Data::Structure::Debug

Set this to a true value to turn on some debugging messages output to STDERR. Defaults to false, and should probably be left that way unless you're reworking init().

# Turn on GraphViz::Data::Structure debugging.
$GraphViz::Data::Structure::Debug = 1;

CLASS METHODS

new()

This is the constructor. It takes one mandatory argument, which is the data structure to be visualised. A GraphViz:Data::Structure object, the name of the top node, and a list defining the 'to' port for this top node (if there is a 'to' port; if none, an empty list) are all returned.

# Graph a data structure, creating a GraphViz object.
# The new GraphViz:Data::Structure object, the name of
# the top node in the structure, and the "in" port are returned.
my ($gvds, $top_name, @port) =
 GraphViz::Data::Structure->new($structure);
print $gvds->graph()->as_png("my.png");

If you so desire, you can use the returned information to join other graphs up to the top of the graph contained in this object by callling graph() to extract the GraphViz object and calling other GraphViz primitives on that object. Most of the time you'll only care about the GraphViz::Data::Structure object and not the additional info.

Optional parameters

You can specify any, none, or all of the following optional keyword parameters:

GraphViz

You can specify your own GraphViz object, in which the graph will be built. GraphViz::Data::Structure nodes all start with the string gvds; if you avoid using nodes with similar names, you should not have any nodename collisions.

# Create a graph of a data structure, using your own GraphViz object
my ($gvds, $top_name, @port) =
  GraphViz::Data::Structure->new($structure,
                                 GraphViz => GraphViz->new());
$gvds->graph()->as_png("my.png");
Depth

If the Depth parameter is supplied, GraphViz::Data::Structure stops at the designated level. If any references are found at this level, plaintext ... nodes are constructed for them. The default limit is no limit.

# Stop after reaching level 7.
my ($gvds, $top_name, @port) =
  GraphViz::Data::Structure->new($structure,
                                 Level => 7);
$gvds->graph()->as_png("my.png");

This can be useful if you have a very large data structure, but showing just the upper levels is sufficient for your purposes.

Fuzz

If your data structure has large pieces of text in it, you will probably want to limit the size of the text displayed to keep GraphViz from creating huge unwieldy nodes. Fuzz allows you to specify the maximum length of any text to be inserted into blocks; the default value is 40 characters.

# Trim any text to 20 characters or less.
my ($gvds, $top_name, @port) =
  GraphViz::Data::Structure->new($structure,
                                 Fuzz => 20);
$gvds->graph()->as_png("my.png");

Be aware: large values for Fuzz will result in long character strings being passed to dot, which will eventually segfault if the strings are long enough.

Orientation

You can choose to have your records laid out so that arrays and hashes are either laid out horizontally, with class labels at the top, or vertically, with class labels on the left. Default is horizontal.

# Stack items vertically.
my ($gvds, $top_name, @port) =
  GraphViz::Data::Structure->new($structure,
                                 Orientation => 'vertical');
$gvds->graph()->as_png("my.png");

You cannot mix horizontal and vertical layouts in the same graph.

Other parameters

GraphViz supports a number of other parameters at the graph level; any parameters that GraphViz::Data::Structure doesn't understand itself will be passed on to GraphViz.

# Add a title and change the default font:
my ($gvds, $top_name, @port) =
  GraphViz::Data::Structure->new($structure,
                                 graph => {label    => 'My graph',
                                           fontname => 'Helvetica'}
                                );

add()

add(), called as a class method, simply calls new(), supporting all of the new() parameters as usual.

# Create a graph (replicates the new() call). Parameters default.
my ($gvds, $top_name, @ports) =
  GraphViz::Data::Structure->add($structure);

INSTANCE METHODS

graph()

graph() returns a GraphViz object, loaded with the nodes and edges corresponding to any data structure passed in via new() and/or add(). You can make any of the standard GraphViz calls to this object.

Methods include as_ps, as_hpgl, as_pcl, as_mif, as_pic, as_gd, as_gd2, as_gif, as_jpeg, as_png, as_wbmp, as_ismap, as_imap, as_vrml, as_vtx, as_mp, as_fig, as_svg. See the GraphViz documentation for more information. The most common methods are:

# Print out a PNG-format file
print $gvds->graph->as_png();

# Print out a PostScript-format file
print $gvds->graph->as_ps();

# Print out a dot file, in "canonical" form:
print $gvds->graph->as_canon();

was_null

was_null() checks to ensure that your data structure didn't generate a graph that was too complex for dot to handle. Directly self-referential structures (e.g., @a = (1,\@a,3)) seem to be the only offenders in this area; if your structure isn't directly self-referential -- by far the most likely situation -- you won't need to use was_null() at all.

was_null forces a dot run to get the "canonical" form of the graph back, which can be computationally expensive; avoid it if possible.

add()

add(), called as an instance method, simply adds new nodes and edges (corresponding to a new data structure) to an existing GraphViz::Data::Structure object.

You can specify the Fuzz, Label, and Depth arguments, just as you would for new(). You cannot specify GraphViz, Orientation, or any of the GraphViz parameters that are used to create a GraphViz object; add() uses the pre-existing GraphViz object in the GraphViz::Data::Structure object to add new nodes.

# Create a graph (replicates the new() call).
my ($gvds, $top_name, @ports) =
  GraphViz::Data::Structure->add($structure);

# Add a second structure; nodes will be merged as necessary.
my ($gvds, $top_name, @ports) =  $gvds->add($structure);

DOT INPUT - LAYOUT DETAILS

Port strings and shape=record nodes are the key to visualizing the data structures in a readable way. The examples in the dot documentation are some help, but a certain amount of experimentation was needed to determine exactly how the port strings needed to be set up so that the desired layout was achieved.

Port strings do two things: they determine where edges come in and where they go out, and they allow you to position items relative to one another inside a node. This conflation of function makes creating port strings that Do What You Want a little more difficult.

A little study of port strings seems to indicate that just alternating items will cause them to be laid out horizontally, while putting them in braces and alternating seems to yield a vertical layout:

# Horizontal port string for 1 2 3:
$ports = "1|2|3";
# Vertical port string for 1 2 3:
 $ports = "{1}|{2}|{3}";

This works fine for very simple sets of boxes in a line (which, from studying the examples, seems to be the principal thing that the original GraphViz implementors used). Anything more complicated (such as getting paired sets of boxes to all line up smartly) takes a bit of extra work.

SCALARS

Scalars are represented either by plaintext nodes (for non-reference values) or record nodes (for references); they don't need ports, because we'll be linking at most one edge out, and there's only one "thingy" to link to in a scalar. However, we do have to deal with blessed scalars as well, which need to have both their class name and value in the node, but need to look different than arrays.

If a scalar's value is a reference, we add a record-style node and link it to the value. If the scalar is blessed, we put the class name and the scalar's value both in the same node by constructing a multi-line string with the class name on top, tagged appropriately, and the value on the bottom.

ARRAYS

Arrays have to be handled four different ways:

Unblessed, laid out vertically
Unblessed, laid out horizontally
Blessed, laid out vertically
Blessed, laid out horizontally

Unblessed arrays should (ideally) simply be rows of boxes, with either values or edges each box. We can set up port strings for this fairly easily:

# Array assumed to contain (1,\$x,"s").
$ports = "<port1>1|<port2>|<port3>s";

This gives a nice row of boxes, with all the cells lined up nicely in either horizontal or vertical orientations. We don't need extra fiddling with the port string to get them to look right.

Things become a bit mode complex for blessed arrays, though, because we want to include the class name as well in the record. We want to make sure that the class name itself isn't confused with any of the data items, so it needs to be off in a box by itself, parallel to the boxes defining the array. This means laying out a box the length of the whole array above the boxes defining the array in a horizontal layout, and a box the height of the whole array to the left of the boxes defining the array in a vertical layout.

Fortunately (again), the same basic port string works in both orientations.

# Object is an array blessed into class "Foo", containing (1,\$x,"s").
# Horizontal:
$ports = "{<port0>Foo|{{<port1>1}|{<port2>}|{<port2>s}}}";

Note that we use the alernating braced items to get the array to lay out at 90 degrees from the box containing the class name. This particular string was arrived at after a fair amount of twiddling in dotty and seems to be the simplest port layout that works.

Empty arrays, if they're unblessed, are just shown as a "[]" plaintext node. If they're blessed, we set up a record that looks sort of like a two-element array, but contains the classname, notes that it's an array, and shows that it's empty explicitly.

HASHES

Hashes are similar to arrays, with the twist that we need to have two parallel sets of boxes which correspond to the keys and values. In addition, we have the same four cases we did for arrays:

Unblessed, laid out vertically (key to right of value)
Unblessed, laid out horizontally (key above value)
Blessed, laid out vertically (key below value)
Blessed, laid out horizontally (key to right of value)

Unblessed hashes should (ideally) simply be pairs of rows of boxes - one for key, one for value - with either values or edges in each "key" box. Setting up port strings for this is a bit more difficult.

# Hash assumed to contain (A=>1,B=>\$x,C=>"s").
# Horizontal:
$hports = "{<port1>A|<port2>1}|{<port3>B|<port4>}|{<port5>C|<port6>s}";

# Vertical:
$vports = "{<port1>A|<port3>B|<port5>C}|{<port2>1|<port4>|<port6>s}";

Switching from horizontal to vertical requires us to separate the keys from the values.

Adding a class name presents some problems. dot is not absolutely symmetric when it comes to parsing complex port strings; in some cases, it carefully lines up all the edges of boxes internal to a record; other times it doesn't. Rather than continue to try to kludge around this, it seemed the better part of valor to simply accept what it would do prettily and ignore the rest. In laying out blessed hashes (and following our self-imposed standard), we can either have

a single box on the left containing the class name for hashes laid out vertically
a single box the top containing the class name for hashes laid out horizontally

Anything else both significantly increases the complexity of the interface ("let's see, arrays should be horizontal, and hashes should be horizontal with names on top, so I code ... uh ...") and, well, doesn't work very well. So we stick with these two basic layouts and keep it pretty and simple.

# Object is an hash blessed into class "Foo".
# Hash assumed to contain (A=>1,B=>\$x,C=>"s").
# Horizontal, name on top:
$hports =
"{<port0>Foo|{{<port1>A|<port2>1}|{<port3>B|<port4>}|{<port5>C|<port6>s}}}";

# Vertical, name on left:
$vports =
"<port0>Foo|{<port1>A|<port3>B|<port5>C}|{<port2>1|<port4>|<port6>s}";

Note that we also have to change how we add the braces to the keys and values when switching where the name is, in addition to separating or associating the keys and values as needed.

The good thing is that once this is all worked out, no one else has to care anymore. It just works and looks nice.

GLOBS

Globs, from the layout point of view, look pretty much like blessed hashes. The only exception for globs is if there's nothing in the glob, we want to display it just as a plaintext node.

In the interest of coding as little as possible, we just reuse the hash code. We construct a tiny pair of wrapper methods which add the necessary information to the parameter list and then call the common module.

CODE references

CODE references are the simplest. We just say that they're code, and add on the class name if they're blessed. The mainline code's done all the nasty work of actually figuring out the code ref's name, so we don't have to worry any further.

HANDING TEXT TO DOT

dot is a C program and therefore can get extremely upset (as in segfault upset) about text that is too long. In addition, it will become very testy if the text contains characters which it considers significant in constructing labels and the like.

It is necessary to clean up and shorten any text that dot will be expected to put into a node. The _dot_escape method is used to do this.

Note that the limit on strings is actually not very large; setting a really big Fuzz will probably make dot segfault when it tries to draw your graph.

BUGS

Cannot catch pointers to individual array or hash elements yet and display the containing items, even though it tries.

BUGS EXPOSED IN DOT

Data structures which point directly to themselves will cause dot to discard all input in some cases. There's currently no fix for this; you can call the was_null() method for now, which will tell you the graph was null and let you decide what to do.

It isn't possible (in current releases of dot) to code a record label which contains no text (e.g.: {<port1>}); this generates a zero-width box. This has been worked around by placing a single period in places where nothing at all would have been preferable. The graphviz developers have developed a patch for dot that corrects the problem, but it is not yet in a released version, though it is in CVS.

OTHER DOT CONSIDERATIONS

The record type is officially deprecated, and it probably would be an idea to convert the labels to HTML format. The current implementation has been updated to work with dot 2.40.1; there's no guarantee that future versions won't break the record type again.

AUTHOR

Joe McMahon <mcmahon@ibiblio.org>

COPYRIGHT

Copyright (C) 2001-2002, Joe McMahon

This module is free software; you can redistribute it or modify it under the same terms as Perl itself.