NAME
Tree::RB::XS - Similar API to Tree::RB implemented in C
SYNOPSIS
NOTICE: This module is very new and you should give it some thorough testing before trusting it with production data.
my $tree= Tree::RB::XS->new;
$tree->put($_ => $_) for 'a'..'z';
say $tree->get('a');
$tree->delete('a');
say $tree->get('a', GET_GT); # finds 'b'
$tree->delete('a','z');
$tree->delete($tree->min, $tree->max);
# optimize for integer comparisons
$tree= Tree::RB::XS->new(key_type => KEY_TYPE_INT);
$tree->put(1 => "x");
# optimize for floating-point comparisons
$tree= Tree::RB::XS->new(key_type => 'float', allow_duplicates => 1);
$tree->put(rand() => 1);
$tree->delete(0, $tree->get_node_lt(.5));
# optimize for byte strings
$tree= Tree::RB::XS->new(key_type => KEY_TYPE_BSTR);
$tree->put("test" => "x");
# inspect the node objects
say $tree->min->key;
say $tree->nth(0)->key;
my $node= $tree->min;
my $next= $node->next;
$node->prune;
DESCRIPTION
This module is a wrapper for a Red/Black Tree implemented in C. It's primary features over other search trees on CPAN are optimized comparisons of keys (speed), O(log N)
node-by-index lookup (which allows the tree to act as an array), and the option to allow duplicate keys while preserving insertion order.
The API is close but not identical to Tree::RB.
The
get
method in this module is not affected by array context, unless you request "compat_list_get".resort
is not implemented (would be lots of effort, and unlikely to be needed)tie-hash interface and hseek are not implemented. (not hard, but does anyone need it?)
Tree structure is not mutable via the attributes of Node, nor can nodes be created independent from a tree.
Many functions have official names changed, but aliases are provided for compatibility.
CONSTRUCTOR
new
my $tree= Tree::RB::XS->new( %OPTIONS );
...->new( sub($x,$y) { $x cmp $y });
Options:
- "key_type"
-
Choose an optimized storage for keys. The default is "KEY_TYPE_ANY" which stores whole perl scalars. All the other types are faster, but perl scalars give the fewest surprises.
- "compare_fn"
-
Choose a custom key-compare function. The default depends on
key_type
. If this is a perl coderef, thekey_type
is forced to be "KEY_TYPE_ANY". Avoid using a coderef if possible. - "allow_duplicates"
-
Whether to allow two nodes with the same key. Defaults to false.
- "compat_list_get"
-
Whether to enable full compatibility with Tree::RB's list-context behavior for "get". Defaults to false.
ATTRIBUTES
key_type
The key-storage strategy used by the tree. Read-only; pass as an option to the constructor.
This is one of the following values: "KEY_TYPE_ANY", "KEY_TYPE_INT", "KEY_TYPE_FLOAT", "KEY_TYPE_BSTR", or "KEY_TYPE_USTR". See the description in EXPORTS for full details on each. If importing constants is annoying, you can specify these simply as "any"
, "int"
, "float"
, "bstr"
, and "ustr"
.
Integers are of course the most efficient, followed by floats, followed by byte-strings and unicode-strings, followed by 'ANY' (which stores a whole perl scalar). BSTR and USTR both save an internal copy of your key, so might be a bad idea if your keys are extremely large and nodes are frequently added to the tree.
compare_fn
Specifies the function that compares keys. Read-only; pass as an option to the constructor.
This is one of: "CMP_PERL", "CMP_INT", "CMP_FLOAT", "CMP_MEMCMP", "CMP_UTF8", "CMP_NUMSPLIT", or a coderef.
If set to a perl coderef, it should take two parameters and return an integer indicating their order in the same manner as Perl's cmp
. Note that this forces key_type => KEY_TYPE_ANY
. Beware that using a custom coderef throws away most of the speed gains from using this XS variant over plain Tree::RB.
If not provided, the default comparison is chosen to match the key_type
, with the defult KEY_TYPE_ANY
using Perl's own cmp
comparator.
Patches welcome, for anyone who wants to expand the list of optimized built-in comparison functions.
allow_duplicates
Boolean, read/write. Controls whether "insert" will allow additional nodes with keys that already exist in the tree. This does not change the behavior of "put", only "insert". If you set this to false, it does not remove duplicates that already existed. The initial value is false.
compat_list_get
Boolean, read/write. Controls whether "get" returns multiple values in list context. I wanted to match the API of Tree::RB
, but I can't bring myself to make an innocent-named method like 'get' change its behavior in list context. So, by deault, this attribute is false and get
always returns one value. But if you set this to true, get
changes in list context to also return the Node, like is done in "lookup" in Tree::RB.
size
Returns the number of elements in the tree.
root_node
Get the root node of the tree, or undef
if the tree is empty.
Alias: root
min_node
Get the tree node with minimum key. Returns undef if the tree is empty.
Alias: min
max_node
Get the tree node with maximum key. Returns undef if the tree is empty.
Alias: max
nth_node
Get the Nth node in the sequence from min to max. N is a zero-based index. You may use negative numbers to count down form max.
Alias: nth
METHODS
get
my $val= $tree->get($key);
->get($key, $mode);
Fetch a value from the tree, by its key. Unlike "get" in Tree::RB, this always returns a single value, regardless of list context. But, you can set compat_list_get to make get
an alias for lookup
.
Mode can be used to indicate something other than an exact match: "GET_EQ", "GET_EQ_LAST", "GET_LE", "GET_LE_LAST", "GET_LT", "GET_GE", "GET_GT".
get_node
Same as "get", but returns the node instead of the value. In trees with duplicate keys, this always returns the first node. (nodes with identical keys are preserved in the order they were added)
Aliases with built-in mode constants:
- get_node_last
- get_node_le
- get_node_le_last
- get_node_lt
- get_node_ge
- get_node_gt
get_all
my @values= $tree->get_all($key);
In trees with duplicate keys, this method is useful to return the values of all nodes that match the key. This can be more efficient than stepping node-to-node for small numbers of duplicates, but beware that large numbers of duplicate could have an adverse affect on Perl's stack.
lookup
Provided for compatibility with Tree::RB. Same as "get" in scalar context, but if called in list context it returns both the value and the node from "get_node". You can also use Tree::RB's lookup-mode constants of "LUEQUAL", etc.
put
my $old_val= $tree->put($key, $new_val);
Associate the key with a new value. If the key previously existed, this returns the old value, and updates the tree to reference the new value. If the tree allows duplicate keys, this will replace all nodes having this key (but only return one of them).
insert
Insert a new node into the tree, and return the index at which it was inserted. If "allow_duplicates" is not enabled, and the node already existed, this returns -1 and does not change the tree. If allow_duplicates
is enabled, this adds the new node after all nodes of the same key, preserving the insertion order.
delete
my $count= $tree->delete($key);
->delete($key1, $key2);
->delete($node1, $node2);
->delete($start, $tree->get($limit, GET_LT));
Delete any node with a key identical to $key
, and return the number of nodes removed. If you supply two keys (or two nodes) this will delete those nodes and all nodes inbetween; $key1
is searched with mode GET_GE
and $key2
is searched with mode GET_LE
, so the keys themselves do not need to be found in the tree. The keys (or nodes) most be given in ascending order, else no nodes are deleted.
If you want to delete a range *exclusive* of one or both ends of the range, just use the /get
method with the desired mode to look up each end of the nodes that you do want removed.
iter
my $iter= $tree->iter;
->iter($from_key);
while (my $node= $iter->()) { ... }
while (my $node= $iter->next) { ... }
Return an iterator object that traverses the tree. The iterator is a blesed coderef, so you can either call it as a fuction or call the ->next
method. If the $from_key
is provided, this starts from $tree->get($key, GET_GE)
.
rev_iter
Like iter
, but the ->next
method walks backward to smaller key values, and if the initial value is provided and duplicate keys are enabled, this starts on the right-most match of the key instead of the left-most match.
NODE OBJECTS
The nodes returned by the methods above have the following attributes:
- key
-
The sort key. Read-only, but if you supplied a reference and you modify what it points to, you will break the sorting of the tree.
- value
-
The data associated with the node. Read/Write.
- prev
-
The previous node in the sequence of keys. Alias
predecessor
forTree::RB::Node
compat. - next
-
The next node in the sequence of keys. Alias
successor
forTree::RB::Node
compat. - tree
-
The tree this node belongs to. This becomes
undef
if the tree is freed or if the node is pruned from the tree. - left
-
The left sub-tree.
- left_leaf
-
The left-most leaf of the sub-tree. Alias
min
forTree::RB::Node
compat. - right
-
The right sub-tree.
- right_leaf
-
The right-most child of the sub-tree. Alias
max
forTree::RB::Node
compat. - parent
-
The parent node, if any.
- color
-
0 = black, 1 = red.
- count
-
The number of items in the tree rooted at this node (inclusive)
And the following methods:
- prune
-
Remove this single node from the tree. The node will still have its key and value, but all attributes linking to other nodes will become
undef
. - strip
-
Remove all children of this node, optionally calling a callback for each. For compat with "strip" in Tree::RB::Node.
- as_lol
-
Return sub-tree as list-of-lists. (array of arrays rather?) For compat with "as_lol" in Tree::RB::Node.
EXPORTS
Key Types
Export all with ':key_type';
- KEY_TYPE_ANY
-
This
key_type
causes the tree to store whole Perl scalars for each node. Its default comparison function is Perl's owncmp
operator. - KEY_TYPE_INT
-
This
key_type
causes the tree to store keys as Perl's integers, which are either 32-bit or 64-bit depending on how Perl was compiled. Its default comparison function puts the numbers in non-decreasing order. - KEY_TYPE_FLOAT
-
This
key_type
causes the tree to store keys as Perl's floating point type, which are either 64-bit doubles or 80-bit long-doubles. Its default comparison function puts the numbers in non-decreasing order. - KEY_TYPE_BSTR
-
This
key_type
causes the tree to store keys as byte strings. The default comparison function is the standard Libcmemcmp
. - KEY_TYPE_USTR
-
Same as
KEY_TYPE_BSTR
but reads the bytes from the supplied key as UTF-8 bytes. The default comparison function is alsomemcmp
even though this does not sort Unicode correctly. (for correct unicode, useKEY_TYPE_ANY
, but it's slower...)
Comparison Functions
Export all with ':cmp'
- CMP_PERL
-
Use Perl's
cmp
function. This forces the keys of the nodes to be stored as Perl Scalars. - CMP_INT
-
Compare keys directly as C integers. This is the fastest option.
- CMP_FLOAT
-
Compare the keys directly as C 'double' values.
- CMP_UTF8
-
Compare the keys as UTF8 byte strings, using Perl's internal
bytes_cmp_utf8
function. - CMP_MEMCMP
-
Compare the keys using C's
memcmp
function. - CMP_NUMSPLIT
-
Compare using the equivalent of this coderef:
sub { my @a_parts= split /([0-9]+)/, $_[0]; my @b_parts= split /([0-9]+)/, $_[1]; my $i= 0; while ($i < @a_parts || $i < @b_parts) { no warnings 'uninitialized'; my $cmp= ($i & 1)? ($a_parts[$i] <=> $b_parts[$i]) : ($a_parts[$i] cmp $b_parts[$i]); return $cmp if $cmp; ++$i; } return 0; }
except the XS implementation is not limited by the integer size of perl, and operates directly on the strings without splitting anything. (i.e. much faster)
This results in a sort where integer portions of a string are sorted numerically, and any non-digit segment is compared as a string. This produces sort-orders like the following:
2020-01-01 2020-4-7 2020-10-12
or
14.4.2 14.14.0
If the
key_type
isKEY_TYPE_BSTR
this will sort the string portions usingmemcmp
, else they are sorted with Perl's unicode-aware sort.
Lookup Mode
Export all with ':get'
- GET_EQ
-
This specifies a node with a key equal to the search key. If duplicate keys are enabled, this specifies the left-most match (least recently added). Has alias
LUEQUAL
to match Tree::RB. - GET_EQ_LAST
-
Same as
GET_EQ
, but if duplicate keys are enabled, this specifies the right-most match (most recently inserted). - GET_GE
-
This specifies the same node of
GET_EQ
, unless there are no matches, then it falls back to the left-most node with a key greater than the search key. Has aliasLUGTEQ
to match Tree:RB. - GET_LE
-
This specifies the same node of
GET_EQ
, unless there are no matches, then it falls back to the right-most node with a key less than the search key. Has aliasLULTEQ
to match Tree::RB. - GET_LE_LAST
-
This specifies the same node of
GET_EQ_LAST
, unless there are no matches, then it falls back to the right-most node with a key less than the search key. - GET_GT
-
Return the first node greater than the key, or
undef
if the key is greater than any node. Has aliasLUGREAT
to match Tree::RB. - GET_LT
-
Return the right-most node less than the key, or
undef
if the key is less than any node. Has aliasLULESS
to match Tree::RB. - GET_NEXT
-
Look for the last node matching the specified key (returning
undef
if not found) then return$node->next
. This is the same asGET_GT
except it ensures the key existed. Has aliasLUNEXT
to match Tree::RB. - GET_PREV
-
Look for the first node matching the specified key (returning
undef
if not found) then return$node->prev
. This is the same asGET_LT
except it ensures the key existed. Has aliasLUPREV
to match Tree::RB.
AUTHOR
Michael Conrad <mike@nrdvana.net>
VERSION
version 0.02
COPYRIGHT AND LICENSE
This software is copyright (c) 2021 by Michael Conrad.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.