NAME

Bio::ConnectDots::SimpleGraph

SYNOPSIS

use SimpleGraph;

my $graph=new Bio::ConnectDots::SimpleGraph;
# read pairs of nodes from STDIN
while (<>) {
  my($node1,$node2)=split;
  $graph->add_edge($node1,$node2);
}
my @nodes=graph->nodes;	    # get list of nodes
my @edges=graph->edges;	    # get list of edges
for each $node (@nodes) {
  my @neighbors=$node->neighbors; # get list of neighboring nodes
}

DESCRIPTION

This is a simple, hopefully fast undirected graph package. The only reason this exists is that the standard CPAN Graph pacakge, Graph::Base, is seriously broken. The package implements a small and eclectic assortment of standard graph algorithms that we happened to need for our applications.

This module is a subclass of Class::AutoClass (available at CPAN). AutoClass auotgenerates simple accessor and mutator methods (aka get and set methods). It also automates class initialization.

Nodes can be any Perl values, including object references. Edges are pairs of nodes.

(Caveat: be careful with values that contain embedded instances of $; (the character Perl uses to separate components of multi-dimensional subscripts), because we use this in the text representation of edges.

The main data structures are:

An edge (x,y) is represented canonically as a two element list in
which the lexically smaller value is first.  Eg, the node ('b','a')
is represented as ['a','b'].  

The graph contains 

1) A hash mapping the text representation of a node to the node
itself.  This is mostly relevant when the node is a reference.

2) A hash mapping the text representation of a node to a list of the node's neighbors.

3) A hash mapping the text representation of an edge to the edge itself.

KNOWN BUGS AND CAVEATS

This is still a work in progress.

AUTHOR - Nat Goodman

Email natg@shore.net

COPYRIGHT

Copyright (c) 2003 Institute for Systems Biology (ISB). All Rights Reserved. This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

APPENDIX

Conventions for nodes and edges

A node can be any Perl values, including an object, ARRAY, or HASH reference. When nodes are references, the software often works with the text representaion of the reference, ie, what you get if you print the reference. This can be confusing. Sorry. For example if a node is the HASH

{name=>'caspase-9',symbol=>'CASP9'}

The text representation would be something like

HASH(0x804c830)

When nodes are scalar values, eg, a string, the value and the text representation are the same. This is a common case in test programs and examples, but less common in real applications.

An edge is represented internally as an ARRAY ref of two nodes, in which the lexically smaller value is first. Actually, the first node is the one whose text representation is lexically smaller.

When passing edges as arguments to SimpleGraph methods, the edge can be represented in several ways.

1) An ARRAY ref of the nodes, eg, ['a','b'].

2) A list of the two nodes, eg, ('a','b')

3) Form (1) or (2) using the text represention of the node instead of the node itself

You needn't worry about which node is lexically smaller. SimpleGraph performs this calculation internally.

When SimpleGraph returns edges as results, they are always in form (1), ie, as ARRAY refs of nodes in correct lexical order.

General conventions for methods

When methods return lists, we generally check the context (via wantarray) and return an ARRSY or ARRAY ref as appropriate. We're not 100% consistent in this (sorry), so check the code if you have doubts.

We often define singular and plural forms of methods, eg, node and nodes. These differ in how they behave in a scalar context. The singular form assumes you want one answer and returns that, while the plural form assumes you want a list of answers are returns it as an ARRAY ref. We're not 100% consistent in this (sorry), so check the code if you have doubts.

The rest of the documentation describes the methods.

Constructors

Title   : new (inherited from Class::AutoClass)
Usage   : my $graph=new Bio::ConnectDots::SimpleGraph;
Function: Create new Bio::ConnectDots::SimpleGraph object
Returns : Newly created object
Args    : (optional)
          nodes=>ARRAY of nodes, eg, ['a','b','c']
          edges=>ARRAY of edges, see add_edges for details

Basic node and edge operations

Title   : add_nodes, add_node
Usage   : $graph->add_nodes('a','b');
          $graph->add_node('a')) {
Function: Add nodes to graph. Nodes that are already in graph
          are ignored.
Args    : ARRAY of nodes.
Returns : Nothing useful
Note    : Singular and plural forms are synonymous

Title   : add_edges, add_edge
Usage   : $graph->add_edges('a','b',['b','c']);
          $graph->add_edge('c','d')) {
Function: Add edges to graph. 
          Edges that are already in graph are not added again, but
          are placed in a separate 'duplicate edges' list.
          Automatically adds any nodes that are not yet in the graph.
Args    : ARRAY of edges in any of the forms described in the
          previous section.  The forms can be mixed as shown in
          the Usage here.
Returns : Nothing useful
Note    : Singular and plural forms are synonymous

Title   : nodes
Usage   : my @nodes=$graph->nodes;
          if (@{$graph->nodes('a','b')}==2) {
            print "a, b are both nodes\n";
          }
Function: Return all nodes or the given ones.  
          With no args returns all nodes.  
          With args, returns the nodes corresponding to each arg, or
          undef if the arg is not a node.  Useful for testing whether
          a given value is a node in the graph.
Args    : (optional)
          ARRAY of nodes or text representations of nodes
Returns : ARRAY or ARRAY ref of nodes (for args that correspond to
          nodes), or undef (for args that are not nodes)

Title   : edges
Usage   : my @edges=$graph->edges;
          if (@{$graph->edges('a','b',['b','c'])}==2) {
            print "[a,b] and [b,c] are both edges\n";
          }
Function: Return all edges or the given ones.  
          With no args returns all edges.  
          With args, returns the edges corresponding to each arg, or
          undef if the arg is not a edge.  Useful for testing whether
          a given value is a edge in the graph.
Args    : (optional)
          One or more edges in any of the forms described in the
          previous section.  The forms can be mixed as shown in
          the Usage here.
Returns : ARRAY or ARRAY ref of edges for args that correspond to
          edges), or undef (for args that are not edges)

Title   : node
Usage   : if ($graph->node('a')) {
            print "a is a node\n";
          }
Function: Test whether a value is a node in the graph, or map the
          text representation of a node to the node itself.  The
          method can also be fed a list of values (like the 'nodes'
          method) and it will test all of them.
Args    : Usually, a single node.
          The function also accepts a list of nodes.
Returns : In scalar context (the usual case): the node corresponding
          to the arg (if there's just one), or the node corresponding
          to the first arg (if a list of args were provided, which is
          kind of dumb in this case), or undef if the arg is not a
          node.

          In array context, it behaves just like 'nodes', returning
          an ARRAY of nodes (for args that correspond to nodes), or
          undef (for args that are not nodes)

Title   : edge
Usage   : if ($graph->edge('a','b')) {
            print "a,b is a edge\n";
          }
          if ($graph->edge(['a','b'])) {
            print "[a,b] is a edge\n";
          }
Function: Test whether a value is a edge in the graph, or map the
          text representation of a edge to the edge itself.  The
          method can also be fed a list of edges (like the 'edges'
          method) and it will test all of them.
Args    : Usually, a single edge.  Same format as 'edges'
          The function also accepts a list of edges, exactly like 
          'edges'
Returns : In scalar context (the usual case): the edge corresponding
          to the arg (if there's just one), or the or the edge
          corresponding to the first arg (if a list of args were
          provided, which is kind of dumb in this case), or undef if
          the arg is not a edge.

          In array context, it behaves just like 'edge's, returning
          an ARRAY of edges (for args that correspond to edges), or
          undef (for args that are not edges)

Title   : has_nodes, has_node
Usage   : if ($graph->has_nodes('a','b')) {
            print "a, b are both nodes\n";
          }
          if ($graph->has_node('a')) {
            print "a is a node\n";
          }
Function: Return true is all args are nodes.
Args    : ARRAY of nodes or text representations of nodes
Returns : Boolean
Note    : Singular and plural forms are synonymous

Title   : has_edges
Usage   : if ($graph->has_edges('a','b',['b','c'])) {
            print "[a,b] and [b,c] are both edges\n";
          }
          if ($graph->has_edge('a','b')) {
            print "[a,b] is an edge\n";
          }
Function: Return true is all args are edges.
Args    : ARRAY of edges in the forms described in the section above
Returns : Boolean
Note    : Singular and plural forms are synonymous

Title   : neighbors, neighbor
Usage   : my @nodes=$graph->neighbors($node)
          my @nodes=$graph->neighbors($node,'node')
          my @edges=$graph->neighbors($edge,'edge');
Function: Return the node or edge neighbors of a given node or edge.
Args    : (mandatory)
          $source: node or edge whose neighbors are sought
          (optional)
          $what: the word 'node' or 'edge' (actually, anything starting
                 with 'n' or 'e' will do)
                 default: 'node'
Returns : ARRAY or ARRAY ref of nodes or edges
Note    : Singular and plural forms are synonymous. This may not be
          right.

Title   : dup_edges
Usage   : my @dups=$graph->dup_edges;
Function: Return duplicate edges
Args    : None
Returns : ARRAY or ARRAY ref of edges that have been added more than
          once.

Graph properties

Title   : is_connected
Usage   : if ($graph->is_connected) {
            print "graph has only one connected component\n";
          }
Function: Return true if the graph is connected
Args    : None
Returns : Boolean

Title   : is_empty
Usage   : if ($graph->is_empty) {
            print "graph has no nodes or edges\n";
          }
Function: Return true if the graph is empty, ie, has no nodes or edges
Args    : None
Returns : Boolean

Title   : is_tree
Usage   : if ($graph->is_tree) {
            print "graph is a tree\n";
          }
Function: Return true if the graph is a tree, ie, it's connected and
          has no cycles
Args    : None
Returns : Boolean

Title   : is_forest
Usage   : if ($graph->is_forest) {
            print "graph is a forest\n";
          }
Function: Return true if the graph is a forest, ie, it has no cycles
          but may not be connected
Args    : None
Returns : Boolean

Title   : is_cyclic
Usage   : if ($graph->is_cyclic) {
            print "graph contains at least one cycle\n";
          }
Function: Return true if the graph is a cyclic.
Args    : None
Returns : Boolean

Title   : density
Usage   : my $density=$graph->density
Function: Compute graph 'density' which is the number of edges
          divided by the maximum possible number of edges
Args    : None
Returns : Number

Graph operations

Title   : subgraph
Usage   : my $subgraph=$graph->subgraph('a','b','c');
Function: Compute node subgraph. Constructs a new graph whose nodes
          are the arguments, and whose edges are the edges of the
          original graph that only involve the given nodes.
Args    : ARRAY of nodes or text representations of nodes
Returns : New graph

Title   : neighbor_subgraph
Usage   : my $subgraph=$graph->subgraph('a');
Function: Construct node subgraph graph whose nodes are the given
          node and its neighbors.  are the arguments, and whose edges
          are the edges of the original graph that only involve the
          given nodes.
Args    : Node or text representations of node
Returns : New graph

Title   : union
Usage   : my $union=$graph->union($other_graph);
Function: Construct new graph whose nodes are the union of the nodes
          of the current graph and $other_graph, and whose edges are
          the union of the edges of the current graph and
          $other_graph.
Args    : $other_graph: a graph
Returns : New graph

Title   : intersection
Usage   : my $intersection=$graph->intersection($other_graph);
Function: Construct new graph whose nodes are the intersection of the
          nodes of the current graph and $other_graph, and whose
          edges are the intersection of the edges of the current
          graph and $other_graph.
Args    : $other_graph: a graph
Returns : New graph

Graph algorithms

Title   : traversal
Usage   : my $traversal=$graph->traversal('a','depth first','node');
          my @nodes;
          while (my $node=$traversal->get_next) {
            push(@nodes,$node);
          }
          my $traversal=$graph->traversal('a','depth first','node');
          my @nodes=$traversal->get_all;
Function: Do node or edge traversal in depth or breadth first order.
Args    : (optional)
          $start: starting node or edge for traversal
                  default: software picks arbitrary start
          $order: 'depth first' or 'breadth first' (actually,
                  anything starting with 'd' or 'b' will do)
                  default: 'depth first'
          $what: 'node' or 'edge' (actually, anything starting
                 with 'n' or 'e' will do)
                 default: 'node'
Returns : SimpleGraph::Traversal object
          This is an iterator with the following methods:

          get_next: get next item in traversal or undef if 
                    traversal is exhausted
          get_this: get current item in traversal
          get_all : get all remaining items in traversal as
                    ARRAY (in array context) or ARRAY ref
          has_next: return true if there are more items in
                    traversal, else undef
          reset   : restart traversal

Note    : It's also possible, and perhaps easier, to perform a
          traversal by creating a SimpleGraph::Traversal object
          directly.  The constructor is

          new Bio::ConnectDots::SimpleGraph::Traversal(-graph=>$graph,-start=>$start,
                                     -order=>$order,-what=>$what)

Title   : node_traversal
Usage   : my $traversal=$graph->node_traversal('a','depth first');
          my @nodes;
          while (my $node=$traversal->get_next) {
            push(@nodes,$node);
          }
          my $traversal=$graph->node_traversal('a','depth first');
          my @nodes=$traversal->get_all;
Function: Do node traversal in depth or breadth first order.
          Wrapper for 'traversal' method. See above.
Args    : (optional)
          $start: starting node for traversal
                  default: software picks arbitrary start
          $order: 'depth first' or 'breadth first' (actually,
                  anything starting with 'd' or 'b' will do)
                  default: 'depth first'
Returns : SimpleGraph::Traversal object

Title   : edge_traversal
Usage   : my $traversal=$graph->edge_traversal(['a','b'],'depth first');
          my @edges;
          while (my $edge=$traversal->get_next) {
            push(@edges,$edge);
          }
          my $traversal=$graph->edge_traversal(['a','b'],'depth first');
          my @edges=$traversal->get_all;
Function: Do edge traversal in depth or breadth first order.
          Wrapper for 'traversal' method. See above.
Args    : (optional)
          $start: starting edge for traversal
                  default: software picks arbitrary start
          $order: 'depth first' or 'breadth first' (actually,
                  anything starting with 'd' or 'b' will do)
                  default: 'depth first'
Returns : SimpleGraph::Traversal object

Title   : components
Usage   : my @components=$graph->components;
          for my $component (@components) {
            my @nodes=$component->nodes;
            my @edges=$component->edges;
          }
Function: Compute the connected components of the graph.  A connected
          component is a maximal connected subgraph.  'Connected'
          means you can get from any node of the component to any
          other by following a path.  'Maximal' means that every node
          you can reach from the component is in the component.
Args    : None
Returns : ARRAY or ARRAY ref of SimpleGraphs
Note    : The software caches the components once computed, so it's efficient
          to call this repeatedly.

Title   : shortest_paths
Usage   : my $paths=$graph->shortest_paths;
          while(my($endpoints,$path)=each %$paths) {
            my($start,$stop)=split($;,$endpoints);
            my @nodes_on_path=@$path;
            print "Path from $start to $stop: @$nodes_on_path\n";
          }
          -- OR --
          my ($paths,$distances)=$graph->shortest_paths(-want_distances=>1);
          while(my($endpoints,$path)=each %$paths) {
            my $distance=$distances->{$endpoints};
            my($start,$stop)=split($;,$endpoints);
            my @nodes_on_path=@$path;
            print "Path from $start to $stop, distance $distance: @$nodes_on_path\n";
          }
Function: Compute shortest path between each pair of nodes.
Args    : (optional)
          -max=>Maximum path length to compute
                Paths longer than this are not found
          -want_distances=>Boolean. If true, distances are also returned
Returns : HASH whose keys are pairs of nodes (encoded in the standard
          Perl manner as two strings joined by $;) and whose values
          are paths. Each path is an ARRAY ref of nodes starting at
          the first endpoint and ending at the second.

          The first endpoint is always lexically smaller (lt) than
          the second.

          If -want_distances is specified, the result is a pair of
          HASHes, one containing paths and the other containing
          distances.

Title   : distances
Usage   : my $distances=$graph->distances;
          while(my($endpoints,$distance)=each %$distances) {
            my($start,$stop)=split($;,$endpoints);
            print "Path from $start to $stop has distance $distance\n";
          }
          -- OR --
          my ($paths,$distances)=$graph->distances(-want_paths=>1);
          while(my($endpoints,$distance)=each %$distances) {
            my $path=$paths->{$endpoints};
            my($start,$stop)=split($;,$endpoints);
            my @nodes_on_path=@$path;
            print "Path from $start to $stop, distance $distance: @$nodes_on_path\n";
          }
Function: Compute distance between each pair of nodes. This is the
          length of the shortest path
Args    : (optional)
          -max=>Maximum path length to compute
                Distances longer than this are not found
          -want_paths=>Boolean. If true, paths are also returned
Returns : HASH whose keys are pairs of nodes (encoded in the standard
          Perl manner as two strings joined by $;) and whose values
          are distances.

          The first endpoint is always lexically smaller (lt) than
          the second.

          If -want_paths is specified, the result is a pair of
          HASHes, one containing paths and the other containing
          distances.

Title   : connected_nodesets
Usage   : my @nodesets=$graph->connected_nodesets;
          for my $nodeset (@nodesets) {
            my @nodes=@$nodeset;
          }
Function: Compute all sets of nodes that form connected subgraphs. 
          A connected nodeset is a set of nodes such that it's
          possible to get from any node to any other by following a
          path that only includes nodes in the set. 
Args    : None
Returns : ARRAY or ARRAY ref of nodeset, where each nodeset is an ARRAY
          ref of nodes.  
Note    : Use with caution.  The number of nodesets is very
          large for graphs that are highly connected.

Title   : connected_subgraphs
Usage   : my @subgraphs=$graph->connected_subgraphs;
Function: Compute all connected subgraphs of the current graph.
Args    : None
Returns : ARRAY or ARRAY ref of subgraphs
Note    : Use with caution.  The number of connected subgraphs is
          very large for graphs that are highly connected.