NAME
GraphViz::Data::Structure - Visualise data structures
SYNOPSIS
use GraphViz::Data::Structure;
my $gvds = GraphViz:Data::Structure->new($data_structure);
print $gvds->graph()->as_png;
DESCRIPTION
This module makes it easy to visualise data structures, even recursive or circular ones.
It is provided as an alternative to GraphViz::Data::Grapher. Differences:
GraphViz::Data::Structure
handles structures of arbitrary depth and complexity, automatically following links using a standard graph traversal algorithm.GraphViz::Data::Grapher
creates graphics of indiividual substructures (arrays, scalars, hashes) which keep the substructure type and data together;GraphViz::Data::Structure
does this by shape alone.GraphViz::Data::Structure
encapsulates object info (if any) directly into the node being used to represent the class.GraphViz::Data::Grapher
colors its graphs;GraphViz::Data::Structure
doesn't by default.GraphViz::Data:Structure
can parse out globs and CODE references (almost as well as the debugger does).
REPRESENTING DATA STRUCTURES AS GRAPHS
Graphviz::Data::Structure
tries to draw data structure diagrams with a minimum of complexity and a maximum of elegance. To this end, the following design choices were made:
- Strings, scalars, filehandles, and code references are represented as plain text.
- Empty hashes and arrays are represented as Perl represents them in code: hashes as
{}
, and arrays as[]
, except if they are blessed (see below). - Arrays are laid out as sets of boxes, in the order in which they were found in the existing data structure (left-to-right or top-to-bottom, depending on overall graph layout).
- Hashes are laid out as pairs of sets of boxes, with the keys in alphabetically-sorted order top-to-bottom or left-to-right.
- Blessed items have a box added to them in parallel, containing the name of the class and its type (scalar/array/hash).
- Code references are decoded to determine their fully-qualified package name and are output as plaintext nodes.
- Glob pointed to by references are disassembled and their individual parts dumped.
ALGORITHM
The algorithm is a standard recursive depth-first treewalk; we determine how the current node should be added to the current graph, add it, and then call ourselves recursively to determine how all nodes below this one should be visualized.Edges are added after the subnodes are added to the graph.
Items "within" the current subnode (array and hash elements which are not references) are rendered inside a cell in the aggregate corresponding to their position. References are represented by an edge linking the appropriate postion in the aggregate to the appropriate subnode.
This code does its data-structure unwrapping in a manner very similar to that used by dumpvar.pl
, the code used by the debugger to display data structures as text. The initial structure treewalk was written in isolation; the dumpvar.pl
code was integrated only after it was recognized that there was more to life than hashes, arrays, and scalars.The dumpvar.pl
code to decode globs and code references was used almost as-is.
Code was added to attempt to spot references to array or hash elements, but this code still does not work as desired. Array and hash element references still appear to be scalars to the current algorithm.
GLOBAL SETTINGS
GraphViz::Data::Structure::Debug
Set this to a true value to turn on some debugging messages output to STDERR. Defaults to false, and should probably be left that way unless you're reworking init().
# Turn on GraphViz::Data::Structure debugging.
$GraphViz::Data::Structure::Debug = 1;
CLASS METHODS
new()
This is the constructor. It takes one mandatory argument, which is the data structure to be visualised. A GraphViz:Data::Structure
object, the name of the top node, and a list defining the 'to' port for this top node (if there is a 'to' port; if none, an empty list) are all returned.
# Graph a data structure, creating a GraphViz object.
# The new GraphViz:Data::Structure object, the name of
# the top node in the structure, and the "in" port are returned.
my ($gvds, $top_name, @port) =
GraphViz::Data::Structure->new($structure);
print $gvds->graph()->as_png("my.png");
If you so desire, you can use the returned information to join other graphs up to the top of the graph contained in this object by callling graph()
to extract the GraphViz
object and calling other GraphViz
primitives on that object. Most of the time you'll only care about the GraphViz::Data::Structure
object and not the additional info.
Optional parameters
You can specify any, none, or all of the following optional keyword parameters:
GraphViz
-
You can specify your own
GraphViz
object, in which the graph will be built.GraphViz::Data::Structure
nodes all start with the stringgvds
; if you avoid using nodes with similar names, you should not have any nodename collisions.# Create a graph of a data structure, using your own GraphViz object my ($gvds, $top_name, @port) = GraphViz::Data::Structure->new($structure, GraphViz => GraphViz->new()); $gvds->graph()->as_png("my.png");
Depth
-
If the
Depth
parameter is supplied,GraphViz::Data::Structure
stops at the designated level. If any references are found at this level, plaintext...
nodes are constructed for them. The default limit is no limit.# Stop after reaching level 7. my ($gvds, $top_name, @port) = GraphViz::Data::Structure->new($structure, Level => 7); $gvds->graph()->as_png("my.png");
This can be useful if you have a very large data structure, but showing just the upper levels is sufficient for your purposes.
Fuzz
-
If your data structure has large pieces of text in it, you will probably want to limit the size of the text displayed to keep
GraphViz
from creating huge unwieldy nodes.Fuzz
allows you to specify the maximum length of any text to be inserted into blocks; the default value is 40 characters.# Trim any text to 20 characters or less. my ($gvds, $top_name, @port) = GraphViz::Data::Structure->new($structure, Fuzz => 20); $gvds->graph()->as_png("my.png");
Be aware: large values for
Fuzz
will result in long character strings being passed todot
, which will eventually segfault if the strings are long enough. Orientation
-
You can choose to have your records laid out so that arrays and hashes are either laid out horizontally, with class labels at the top, or vertically, with class labels on the left. Default is
horizontal
.# Stack items vertically. my ($gvds, $top_name, @port) = GraphViz::Data::Structure->new($structure, Orientation => 'vertical'); $gvds->graph()->as_png("my.png");
You cannot mix horizontal and vertical layouts in the same graph.
- Other parameters
-
GraphViz
supports a number of other parameters at the graph level; any parameters thatGraphViz::Data::Structure
doesn't understand itself will be passed on toGraphViz
.# Add a title and change the default font: my ($gvds, $top_name, @port) = GraphViz::Data::Structure->new($structure, graph => {label => 'My graph', fontname => 'Helvetica'} );
add()
add()
, called as a class method, simply calls new()
, supporting all of the new()
parameters as usual.
# Create a graph (replicates the new() call). Parameters default.
my ($gvds, $top_name, @ports) =
GraphViz::Data::Structure->add($structure);
INSTANCE METHODS
graph()
graph()
returns a GraphViz
object, loaded with the nodes and edges corresponding to any data structure passed in via new()
and/or add()
. You can make any of the standard GraphViz
calls to this object.
Methods include as_ps
, as_hpgl
, as_pcl
, as_mif
, as_pic
, as_gd
, as_gd2
, as_gif
, as_jpeg
, as_png
, as_wbmp
, as_ismap
, as_imap
, as_vrml
, as_vtx
, as_mp
, as_fig
, as_svg
. See the GraphViz
documentation for more information. The most common methods are:
# Print out a PNG-format file
print $gvds->graph->as_png();
# Print out a PostScript-format file
print $gvds->graph->as_ps();
# Print out a dot file, in "canonical" form:
print $gvds->graph->as_canon();
was_null
was_null()
checks to ensure that your data structure didn't generate a graph that was too complex for dot
to handle. Directly self-referential structures (e.g., @a = (1,\@a,3)
) seem to be the only offenders in this area; if your structure isn't directly self-referential -- by far the most likely situation -- you won't need to use was_null()
at all.
was_null
forces a dot
run to get the "canonical" form of the graph back, which can be computationally expensive; avoid it if possible.
add()
add()
, called as an instance method, simply adds new nodes and edges (corresponding to a new data structure) to an existing GraphViz::Data::Structure
object.
You can specify the Fuzz
, Label
, and Depth
arguments, just as you would for new()
. You cannot specify GraphViz
, Orientation
, or any of the GraphViz
parameters that are used to create a GraphViz
object; add()
uses the pre-existing GraphViz
object in the GraphViz::Data::Structure
object to add new nodes.
# Create a graph (replicates the new() call).
my ($gvds, $top_name, @ports) =
GraphViz::Data::Structure->add($structure);
# Add a second structure; nodes will be merged as necessary.
my ($gvds, $top_name, @ports) = $gvds->add($structure);
DOT INPUT - LAYOUT DETAILS
Port strings and shape=record
nodes are the key to visualizing the data structures in a readable way. The examples in the dot
documentation are some help, but a certain amount of experimentation was needed to determine exactly how the port strings needed to be set up so that the desired layout was achieved.
Port strings do two things: they determine where edges come in and where they go out, and they allow you to position items relative to one another inside a node. This conflation of function makes creating port strings that Do What You Want a little more difficult.
A little study of port strings seems to indicate that just alternating items will cause them to be laid out horizontally, while putting them in braces and alternating seems to yield a vertical layout:
# Horizontal port string for 1 2 3:
$ports = "1|2|3";
# Vertical port string for 1 2 3:
$ports = "{1}|{2}|{3}";
This works fine for very simple sets of boxes in a line (which, from studying the examples, seems to be the principal thing that the original GraphViz
implementors used). Anything more complicated (such as getting paired sets of boxes to all line up smartly) takes a bit of extra work.
SCALARS
Scalars are represented either by plaintext nodes (for non-reference values) or record nodes (for references); they don't need ports, because we'll be linking at most one edge out, and there's only one "thingy" to link to in a scalar. However, we do have to deal with blessed scalars as well, which need to have both their class name and value in the node, but need to look different than arrays.
If a scalar's value is a reference, we add a record-style node and link it to the value. If the scalar is blessed, we put the class name and the scalar's value both in the same node by constructing a multi-line string with the class name on top, tagged appropriately, and the value on the bottom.
ARRAYS
Arrays have to be handled four different ways:
- Unblessed, laid out vertically
- Unblessed, laid out horizontally
- Blessed, laid out vertically
- Blessed, laid out horizontally
Unblessed arrays should (ideally) simply be rows of boxes, with either values or edges each box. We can set up port strings for this fairly easily:
# Array assumed to contain (1,\$x,"s").
$ports = "<port1>1|<port2>|<port3>s";
This gives a nice row of boxes, with all the cells lined up nicely in either horizontal or vertical orientations. We don't need extra fiddling with the port string to get them to look right.
Things become a bit mode complex for blessed arrays, though, because we want to include the class name as well in the record. We want to make sure that the class name itself isn't confused with any of the data items, so it needs to be off in a box by itself, parallel to the boxes defining the array. This means laying out a box the length of the whole array above the boxes defining the array in a horizontal layout, and a box the height of the whole array to the left of the boxes defining the array in a vertical layout.
Fortunately (again), the same basic port string works in both orientations.
# Object is an array blessed into class "Foo", containing (1,\$x,"s").
# Horizontal:
$ports = "{<port0>Foo|{{<port1>1}|{<port2>}|{<port2>s}}}";
Note that we use the alernating braced items to get the array to lay out at 90 degrees from the box containing the class name. This particular string was arrived at after a fair amount of twiddling in dotty
and seems to be the simplest port layout that works.
Empty arrays, if they're unblessed, are just shown as a "[]" plaintext node. If they're blessed, we set up a record that looks sort of like a two-element array, but contains the classname, notes that it's an array, and shows that it's empty explicitly.
HASHES
Hashes are similar to arrays, with the twist that we need to have two parallel sets of boxes which correspond to the keys and values. In addition, we have the same four cases we did for arrays:
- Unblessed, laid out vertically (key to right of value)
- Unblessed, laid out horizontally (key above value)
- Blessed, laid out vertically (key below value)
- Blessed, laid out horizontally (key to right of value)
Unblessed hashes should (ideally) simply be pairs of rows of boxes - one for key, one for value - with either values or edges in each "key" box. Setting up port strings for this is a bit more difficult.
# Hash assumed to contain (A=>1,B=>\$x,C=>"s").
# Horizontal:
$hports = "{<port1>A|<port2>1}|{<port3>B|<port4>}|{<port5>C|<port6>s}";
# Vertical:
$vports = "{<port1>A|<port3>B|<port5>C}|{<port2>1|<port4>|<port6>s}";
Switching from horizontal to vertical requires us to separate the keys from the values.
Adding a class name presents some problems. dot
is not absolutely symmetric when it comes to parsing complex port strings; in some cases, it carefully lines up all the edges of boxes internal to a record; other times it doesn't. Rather than continue to try to kludge around this, it seemed the better part of valor to simply accept what it would do prettily and ignore the rest. In laying out blessed hashes (and following our self-imposed standard), we can either have
- a single box on the left containing the class name for hashes laid out vertically
- a single box the top containing the class name for hashes laid out horizontally
Anything else both significantly increases the complexity of the interface ("let's see, arrays should be horizontal, and hashes should be horizontal with names on top, so I code ... uh ...") and, well, doesn't work very well. So we stick with these two basic layouts and keep it pretty and simple.
# Object is an hash blessed into class "Foo".
# Hash assumed to contain (A=>1,B=>\$x,C=>"s").
# Horizontal, name on top:
$hports =
"{<port0>Foo|{{<port1>A|<port2>1}|{<port3>B|<port4>}|{<port5>C|<port6>s}}}";
# Vertical, name on left:
$vports =
"<port0>Foo|{<port1>A|<port3>B|<port5>C}|{<port2>1|<port4>|<port6>s}";
Note that we also have to change how we add the braces to the keys and values when switching where the name is, in addition to separating or associating the keys and values as needed.
The good thing is that once this is all worked out, no one else has to care anymore. It just works and looks nice.
GLOBS
Globs, from the layout point of view, look pretty much like blessed hashes. The only exception for globs is if there's nothing in the glob, we want to display it just as a plaintext node.
In the interest of coding as little as possible, we just reuse the hash code. We construct a tiny pair of wrapper methods which add the necessary information to the parameter list and then call the common module.
CODE references
CODE
references are the simplest. We just say that they're code, and add on the class name if they're blessed. The mainline code's done all the nasty work of actually figuring out the code ref's name, so we don't have to worry any further.
HANDING TEXT TO DOT
dot
is a C program and therefore can get extremely upset (as in segfault upset) about text that is too long. In addition, it will become very testy if the text contains characters which it considers significant in constructing labels and the like.
It is necessary to clean up and shorten any text that dot
will be expected to put into a node. The _dot_escape
method is used to do this.
Note that the limit on strings is actually not very large; setting a really big Fuzz
will probably make dot
segfault when it tries to draw your graph.
BUGS
Cannot catch pointers to individual array or hash elements yet and display the containing items, even though it tries.
BUGS EXPOSED IN DOT
Data structures which point directly to themselves will cause dot
to discard all input in some cases. There's currently no fix for this; you can call the was_null()
method for now, which will tell you the graph was null and let you decide what to do.
It isn't possible (in current releases of dot
) to code a record label which contains no text (e.g.: {<port1>}
); this generates a zero-width box. This has been worked around by placing a single period in places where nothing at all would have been preferable. The graphviz
developers have developed a patch for dot
that corrects the problem, but it is not yet in a released version, though it is in CVS.
OTHER DOT CONSIDERATIONS
The record
type is officially deprecated, and it probably would be an idea to convert the labels to HTML format. The current implementation has been updated to work with dot 2.40.1
; there's no guarantee that future versions won't break the record
type again.
AUTHOR
Joe McMahon <mcmahon@ibiblio.org>
COPYRIGHT
Copyright (C) 2001-2002, Joe McMahon
This module is free software; you can redistribute it or modify it under the same terms as Perl itself.