NAME
Algorithm::Tree::NCA - Constant time retrieval of Nearest Common Ancestor
SYNOPSIS
use Algorithm::Tree::NCA;
my $tree = ...;
my $nca = new Algorithm::Tree::NCA(-tree => $tree);
my $x = $tree->get_node(...);
my $y = $tree->get_node(...);
my $z = $nca->nca($x,$y);
DESCRIPTION
This package provides constant-time retrieval of the Nearest Common Ancestor (NCA) of nodes in a tree. The implementation is based on the algorithm by Harel and which can, after linear-time preprocessing, retrieve the nearest common ancestor of two nodes in constant time.
To implement the algorithm it is necessary to store some data for each node in the tree.
- - A node number assigned to the node in a pre-order fashion
- - A number to identify the run of the node ("ALGORITHM")
- - The leader for each run, which should be retrievable through its node number
- - A magic number ("ALGORITHM")
- - The parent node for each node
- - The maximum number assigned to a any node in the subtree
All data above, with the exception of the node number, is stored in an array inside the Algorithm::Tree::NCA
object.
The node number has to be stored in the actual tree node in some manner (alternative solutions would be to slow to give constant-time retrieval), which requires a set method and a get method for the nodes. Since the most common case is using hashes to represent nodes, there are default implementations of the set and get methods.
The default set method is:
sub _set_method {
my($node,$value) = @_;
$node->{'_nca_number'} = $value;
}
and the default get method is:
sub _get_method {
my($node) = @_;
return $node->{'_nca_number'};
}
If have chosen another representation of your nodes, you can provide alternative set and get methods by using the -set and -get options when creating the Algorithm::Tree::NCA
object.
CLASS AND OBJECT METHODS
EXAMPLES
ALGORITHM
This section describes the algorithm used for preprocessing and for nearest common ancestor retrieval. It does not provide any intuition to why the algorithm works, just a description how it works. For the algorithm description, it is assumed that the nodes themself contain all necessary information. The algorithm is described in a Pascal-like fashion. For detailed information about the algorithm, please have a look in [1] or [2].
The height of a non-zero integer is the number of zeros at the right end of the integer. The least significant set bit (LSSB) of a non-zero number is the the number with only the least significant bit set (surprise). For instance, here is the LSSB and the height of some numbers:
Number LSSB Height
-------- -------- ------
01001101 00000001 0
01001100 00000100 2
Important to note here is that for numbers i and j, height(i) < height(j) if and only if LSSB(i) < LSSB(j), which means that we can replace a test of height(i) < height(j) with LSSB(i) < LSSB(j). Since LSSB(i) is easier to compute, this will speed up the computation.
Preprocessing the tree
Preprocessing the tree requires the computation of three numbers: the node number, the run, and a magic number. It also requires computation of the leader of each run. These computations are done in two recursive descents and ascents of the tree.
Procedure Preprocess(root:Node)
Var x,y : Integer; (* Dummy variables *)
Begin
(x,y) := Enumerate(root,nil,1);
ComputeMagic(root,root,0);
End;
In the first phase, we enumerate the tree, compute the run for each node, the max number assigned to a node in the subtree, and also the parent of each node. If the parent is already available through other means, that part is redundant. The run of a node is the number of the node in the subtree with the largest height.
Function Enumerate(node,parent:Node; num:Integer) : (Integer,Integer)
Var run : Integer;
Begin
node.parent := parent;
node.number := num;
node.run := num;
run := num;
num := num + 1;
Foreach child in node.children Do
(num,run) := Enumerate(child,node,num);
If height(run) > height(node.run) Then
node.run := run;
Done
node.max := num;
Return (num,node.run)
End;
In the second phase, we compute the leader for each run (which we can since we know the run for each node) and the magic number. The leader has to be stored so that we can access is through a node number, so we store it in an array.
VAR Leader : Array [1..NODE_COUNT] of NodePtr;
The leader for each run can either be stored for each node (which we assume here), or only stored in the node where the node.run == node.number
. We can then compute the leader of any node
through Leader(node.run)
, which requires less storage if Leader
is implemented as a spare array.
The magic number of a node is the bitwise or of all run:s of nodes in the path leading from the root node to the node.
Procedure ComputeMagic(node, current_leader:Node; magic:Integer)
Begin
node.magic = magic | LSSB(node.run);
If node.run != leader.run Then
Leader(node.number) = node
Else
Leader(node.number) = current_leader
Foreach child in node.children Do
ComputeMagic(child, Leader(node.number), node.magic)
Done
End;
Constant-time retrieval of the nearest common ancestor
To find the NCA of two nodes, we map the nodes to a binary tree and find the NCA b there (which is easy). We then do some bitwise arithmetics to find the bit j where the magic numbers of the two nodes have a 1
in common and that has a greater height than b.
Function NCA(x,y:Node) : Node;
Begin
b := BinNCA(x,y); (* b = 10111000 *)
m := (NOT b) AND (b - 1); (* m = 00000111 *)
c := x.magic AND y.magic;
u := c AND (NOT m);
j := LSSB(u);
x1 := Closest(x,j);
y1 := Closest(y,j);
If x1.number < y1.number Then
Return x1
Else
Return y1
End;
Retrieving the nearest common ancestor in a complete binary tree is easy assuming you have a special numbering of the nodes. We number each node with a path number such that the root node is numbered 10000000
and for each choice down the tree, we use (for example) 0
for left and 1
for right. Assuming that the nodes are not on the same path from the root, we can then:
- a. compute the XOR of the path numbers,
- b. find the most significant
1
in the XOR, which is where the paths differ; - c. take the part before (i.e., high end) the most significant
1
in one of the path number (either one, since these parts are equal), - d. add a
1
after, and - e. set the lowest part after the
1
to all zeroes.
The value returned is the path number of the node that is the nearest common ancestor. (In this implementation, the mapping from node number to run is a mapping to a binary tree where the run is the path number.)
Function BinNCA(x,y:Node) : Integer
Var k,m,r : Integer;
Begin
(* Check that neither is the ancestor of the other *)
If x.number <= y.number and y.number < x.max Then
Return x.run
If y.number <= x.number and x.number < y.max Then
Return y.run
(* Suppose x.run = 10110--- and y.run = 10111--- *)
(* Then x.run XOR y.run = 00001---, and further: *)
k := MSSB(x.run XOR y.run); (* k = 00001000 *)
m := k XOR (k - 1); (* m = 00001111 *)
r := (NOT m) AND x.run; (* r = 10110000 *)
Return r OR k; (* result: 10111000 *)
End;
To find the node closest to a node n but on the same run as the NCA z we need the j supplied by the NCA
function above.
Function Closest(n:Node; j:Integer) : Node;
Begin
l := LSSB(n.magic);
If l = j Then Return x
k := MSSB((j - 1) AND x.magic);
u := ((NOT ((k - 1) OR k)) AND x.run) OR k
w := Leader(u);
Return w.parent; (* z = w.parent *)
End;
REFERENCES
- [1] Fast algorithms for finding nearest common ancestor by D. Harel and R. E. Tarjan.
- [2] On finding lowest common ancestor: simplifications and parallelizations by B. Schieber and U. Vishkin.
- [3] Algorithms on strings, trees, and sequences by Dan Gusfield.
AUTHOR
Mats Kindahl <matkin@acm.org>
SEE ALSO
perl.