NAME
DBIx::Tree::NestedSet
SYNOPSIS
Implements a "Nested Set" parent/child tree.
DESCRIPTION
This module implements a "Nested Set" parent/child tree, and is focused (at least in my mind) towards offering methods that make developing web applications easier. It should be generally useful, though.
See the "SEE ALSO" section for resources that explain the advantages and features of a nested set tree. This module gives you arbitrary levels of categorization, the ability to put in metadata associated with a category via simple method arguments and storage via DBI. It's been tested on MySQL but I've taken pains to avoid using MySQL specific SQL statements.
The basic thing is that a nested set tree is "expensive" on updates because you have to edit quite a bit of the tree on inserts, deletes, or the movement of nodes. Conversely, it is "cheaper" on just queries of the tree because nearly every action (getting children, getting parents, getting siblings, etc) can be done *with one SQL query*. So if you're developing apps that require many reads and few updates to a tree (like pretty much every web app I've ever built) a nested set should offer significant performance advantages over the recursive queries required by an adjacency list model.
Whew. Say that fast three times.
You'll need to create a table in your database and then pass options to new(). See the "Table Definition" section for an example "create table" statement.
METHODS
new
new() accepts a number of parameters. You MUST pass new() a valid DBI handle.
- dbh
-
The DBI handle returned by DBI::connect().
- left_column_name
-
The name of the column that describes the left hand side of a node. Defaults to "lft".
- right_column_name
-
The name of the column that describes the right hand side of a node. Defaults to "rght".
- table_name
-
The name of the table that describes the nested set. Defaults to "nested_set".
- No_RaiseError
-
By default this module will turn on the "RaiseError" attribute in $dbh. Setting the "No_RaiseError" value to true (because you do not want RaiseError enabled or because it is turned it on elsewhere) will disable this behavior.
- no_locking
-
Setting this option to a true value will disable file locking for methods that alter the tree stored via DBI. Currently, we lock the entire table, as most "editing" methods have the potential to edit every value on even minor changes.
- no_alter_table
-
Don't do the automagical table altering stuff used to create columns on-the-fly. See "add_child_to_right" for a description of how this module stores meta-data. Turning off the automagical table altering will probably increase performance, but you won't be able to add in meta-data whenever you want on adding or updating nodes.
Turning off automagical table altering will cause the module to error out if you try and add in new meta-data that doesn't have a column defined for it in the DBI table. You are warned.
It probably makes sense to turn off automagical table altering after you've put the application into production and you're done development, but that depends on how you build your app.
- trace
-
Will turn on DBI::trace() at the level you specify here and output some additional debugging info to STDERR.
Example:
#Create a nested set tree with the defaults
my $tree=DBIx::Tree::NestedSet->new(dbh=>$dbh);
get_root
Gets the id of the "root" node of the tree.
add_child_to_right
This will add a child to the "right" of all its siblings.
Takes the following parameters as a hash:
- id
-
The ID of the parent node we want to add the child to. If you don't give an ID or the id isn't valid, it will add the child under the root node.
Any other parameter passed in as a hash will cause the module to alter the table to add a column to hold it, and then store that data for you. Example:
Say you have a table that looks like:
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| id | mediumint(9) | | PRI | NULL | auto_increment |
| lft | mediumint(9) | | MUL | 0 | |
| rght | mediumint(9) | | MUL | 0 | |
| name | varchar(255) | | MUL | | |
+----------+--------------+------+-----+---------+----------------+
and you execute:
$tree->add_node_to_right(id=>$tree->get_root(),name=>'Foo Name',template=>'Bar');
Then the module will create a node named "Foo Name" under the root as the "rightmost" child. The "template" column will be created and "Bar" will be put in this nodes "template" column. The table would then look like:
+----------+--------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+----------+--------------+------+-----+---------+----------------+
| id | mediumint(9) | | PRI | NULL | auto_increment |
| lft | mediumint(9) | | MUL | 0 | |
| rght | mediumint(9) | | MUL | 0 | |
| name | varchar(255) | | MUL | | |
| template | varchar(255) | | MUL | | |
+----------+--------------+------+-----+---------+----------------+
Feel free to tweak the columns after the module creates them (or create them in advance, it doesn't really matter). You may want to add indeces if you're going to be doing other selects on the nested_set table.
This table altering behavior allows you to store metadata about a node simply, with a tradeoff that your metadata could be "flat" and potentially poorly normalized.
Returns the id of the newly added child.
add_child_to_left
Same as add_child_to_right, except this puts the child to the left of its siblings.
edit_node
Edits a node and will exhibit the same "table altering" behavior of add_child_to_right. Pass in parameters as a hash, and "id" controls which node you're editing.
Example:
#All other values are retained, we're just changing the name of the node
#with the id in "$edit_id"
$tree->edit_node(id=>$edit_id,name=>'New Name');
get_id_by_key
Looks up a node(s) by a key name and key value. Takes two parameters:
- key_name
-
The name of the column in the database you're doing a lookup on.
- key_value
-
The value you want to look up.
If there is more than one node found, we return an array reference. Otherwise we return a scalar. If nothing is found, you'll get a non-true value.
Example:
my $node=$tree->get_id_by_key(key_name=>'name',key_value=>'Foo Name');
if(ref $node){
#We have more than one id returned.
} else {
#We have a single id/node.
}
get_self_and_parents_flat
This will get a node and it's parents down to the root node. Takes the id of the starting node as a hash.
Returns an arrayref of hashrefs (AoH). The hashrefs will have as keys the column names of the table, including those automatically added by the add_*() and edit_node() methods.
This method does NOT return a "nested hash" or "nested array" of nodes, hence the "flat" in the method name.
Additionally there will be a "level" hashkey that's the level of the node, with level 1 being the root.
Example:
my $self_and_parents=$tree->get_self_and_parents(id=>$starting_id);
foreach(@$self_and_parents){
print 'ID: '.$_->{id}.' is at level '.$_->{level}."\n";
}
Besides arrays of hashrefs being easy to use, this object is PERFECT for passing to HTML::Template::param(). Returns non-true in the event a node doesn't have parents.
get_parents_flat
Same as get_self_and_parents_flat but excludes the starting node.
delete_self_and_children
Similar to get_self_and_children, but deletes nodes from the starting id inclusively. Returns an arrayref of the IDs that were deleted or a non-true value if none.
Example: my $ids=$tree->delete_self_and_children(id=>$delete_from);
Will delete from the ID in $delete_from and $ids will contain an arrayref of the deleted IDs.
delete_children
Similar to delete_self_and_children, but leaves the starting id untouched. This method just deletes the children (recursively) of the starting node.
get_self_and_children_flat
Nearly identical to get_self_and_parents flat, except it retrieves the children of the starting node (and the starting node itself) recursively.
Takes a depth parameter additionally, which will specify how far down in the tree from the starting node to go.
Example:
my $self_and_children=$tree->get_self_and_children_flat(id=>$start_id,depth=>2);
Will retrieve an AoH starting from $start_id going down a maximum of 2 levels.
swap_nodes
Takes two parameters: first_id and second_id. It will "swap" the nodes represented by these ids, essentially replacing one node with the other. Children will tag along and order will be preserved. swap_nodes() can be used to reorder nodes in a tree OR swap nodes to different levels within a tree.
Example:
$tree->swap_nodes(first_id=>$first_id,second_id=>$second_id);
$first_id and $second_id will be "swapped" in the tree.
get_hashref_of_info_by_id
Will return a hashref of the information associated with a node specified by the "id" parameter. Umm. . . Except "level," which we return with the other get_* methods. Computing the level would probably be expensive.
This is probably dumb, but in this case you don't need to pass in the ID as a hash, because this method only every takes one argument. Returns "undef" if a node without that ID isn't found.
Example:
my $node_info=$tree->get_hashref_of_info_by_id($node_id);
print $node_info->{id};
create_report
Returns a very simple report (in a scalar) of the tree. Takes a few parameters:
- id
-
The id to start the report from. If none is given, it'll start from the root node.
- indent_level
-
The number of spaces to indent each level with. Defaults to 2 spaces per level.
Example:
my $report=$tree->create_report(indent_level=>4);
print $report;
Will create a report starting from the "root" with 4 spaces of indentation per level.
TABLE DEFINITION
The base "nested_set" table definition is below. Columns will be added when you pass extra parameters to methods noted above.
You can add columns you're going to use proactively, and/or "tweak" the columns after you've let this module create them. Just make sure that you use valid SQL column names for the attributes you pass to the edit_node() and add_*() methods.
########################################
CREATE TABLE nested_set (
id mediumint(9) NOT NULL auto_increment,
lft mediumint(9) NOT NULL default '0',
rght mediumint(9) NOT NULL default '0',
PRIMARY KEY (id),
KEY lft (lft),
KEY rght (rght)
);
########################################
This module has only been tested on MySQL, though I suspect it should work as-is on many different RDBMSs. Let me know if you're using it successfully.
WHY?
I've implemented a couple different nested tree models in the past, from a flat "one column per level" monstrosity to a typical "adjacency list" parent/child model.
The "one column per level" model was a BEAR to work with, especially when it came to adding more levels, editing/deleting children and creating parent lists.
An "adjacency list" is the typical "id/parent_id" model, as illustrated below:
food food_id parent_id
================== ======= =========
Food 001 NULL
Beans and Nuts 002 001
Beans 003 002
Nuts 004 002
Black Beans 005 003
Pecans 006 004
(That table was ripped off directly from DBIx::Tree)
The recursive queries involved with "adjacency list" models always bugged me and I couldn't get acceptable performance metrics without caching bits of the tree.
The "nested set" model appears, theoretically, to be perfect for most of the web applications I develop: it's very fast to create lists of children and parents, at the cost of much more complicated and processor-intense updating.
I've also taken pains to create methods that are useful for web application development but not specific to it.
If you have an application that sees many reads of a nested tree but not as many writes or updates, the "nested set" model this module implements should offer significant performance benefits over an adjacency list.
SEE ALSO
DBIx::Tree, which implements an "adjacency list" model of nested trees.
DBIx::NestedSet::Manage which is included with this distribution and implements a CGI::Application and HTML::Based system for managing trees via DBIx::NestedSet and implements most DBIx::NestedSet methods.
http://www.intelligententerprise.com/001020/celko.jhtml
http://www.dbmsmag.com/9603d06.html
http://www.dbmsmag.com/9604d06.html
http://www.dbmsmag.com/9605d06.html
http://www.dbmsmag.com/9606d06.html
For those last three links, the "Nested Set" discussion starts about halfway through the articles.
BUGS
Yes. I'm sure there are some. Please contact me if you find any.
Things to avoid:
Keep the names of columns, the table, and any automagically added meta-data keys to fit m/^[_A-Za-z\d]+$/, which is A-Z, a-z, digits, and the underscore. And don't use SQL reserved words.
TODO
Create methods to get children that DO implement "nested array" trees.
Do benchmarking to see how a nested set model performs under various scenarios.
Maybe create a "traversal" system other than the very simple:
my $nodes=$tree->get_self_and_children(id=$tree->get_root); foreach my $node(@$nodes){ #do something with the hashref that represents this node. }
AUTHOR
Dan Collis Puro, Geekuprising.com. Email: dan at geekuprising dot com.
This model was inspired by the perlmonks.org thread below:
http://www.perlmonks.org/index.pl?node_id=354049
See "Tilly's" response in particular. I'm "Hero Zzyzzx".
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.