NAME
Ref::Store - Store objects, index by object, tag by objects - all without leaking.
SYNOPSIS
my $table = Ref::Store->new();
Store a value under a simple string key, maintain the value as a weak reference. The string key will be deleted when the value is destroyed:
$table->store("key", $object);
Store $object
under a second index ($fh
), which is a globref; $fh
will automatically be garbage collected when $object
is destroyed.
{
open my $fh, ">", "/foo/bar";
$table->store($fh, $object, StrongKey => 1);
}
# $fh still exists with a sole reference remaining in the table
Register an attribute type (foo_files
), and tag $fh
as being one of $foo_files
, $fh
is still dependent on $object
# assume $fh is still in scope
$table->register_kt("foo_files");
$table->store_a(1, "foo_files", $fh);
Store another foo_file
open my $fh2, ">", "/foo/baz"
$table->store_a(1, "foo_files", $fh);
# $fh2 will automatically be deleted from the table when it goes out of scope
# because we did not specify StrongKey
Get all foo_file
s
my @foo_files = $table->fetch_a(1, "foo_files");
# @foo_files contains ($fh, $fh2);
Get rid of $object
. This can be done in one of the following ways:
# Implicit garbage collection
undef $object;
# Delete by value
$table->purge($object);
# Delete by key ($fh is still stored under the foo_keys attribute)
$table->purgeby($fh);
# remove each key for the $object value
$table->unlink("key");
$table->unlink($fh); #fh still exists under "foo" files
Get rid of foo_file
entries
# delete, by attribute
$table->purgeby_a(1, "foo_files");
# delete a single attribute from all entries
$table->unlink_a(1, "foo_files");
# dissociate the 'foo_files' attribtue from each entry
$table->dissoc_a(1, "foo_files", $fh);
$table->dissoc_a(1, "foo_files", $fh2);
# implicit garbage collection:
undef $fh;
undef $fh2;
For a more detailed walkthrough, see Ref::Store::Walkthrough
DESCRIPTION
Ref::Store provides an efficient and worry-free way to index objects by arbitrary data - possibly other objects, simple scalars, or whatever.
It relies on magic and such to ensure that objects you put in the lookup table are not maintained there unless you want them to be. In other words, you can store objects in the table, and delete them without having to worry about what other possible indices/references may be holding down the object.
If you are looking for something to store Data, then direct your attention to KiokuDB, Tangram or Pixie - these modules will store your data.
However, if you are specifically wanting to maintain garbage-collected and reference counted perl objects, then this module is for you. continue reading.
FEATURES
The problem
I've had quite the difficult task of explaining what this module actually does.
Specifically, Ref::Store
is intended for managing and establishing dependencies and lookups between perl objects, which at their most basic levels are opaque entities in memory that are reference counted.
The lifetime of an object ends when its reference count hits zero, meaning that no data structure or code is referring to it.
When a new object is created (normally done through bless
), it has a reference count of 1. As more copies of the object reference are made, the reference count of the object increases.
What this also means is that each time an object is inserted into a hash as a value, its reference count increases - and as a result, the object (and all other objects which it contains) will not be destroyed until it is removed from all those hashes.
Perl core offers the concept of a weak reference, and is exposed to perl code via Scalar::Util's weaken. Weak references allow the maintaining of an object reference without actually increasing the object's reference count.
Internally (in the Perl core), the object maintains a list of all weak references referring to it, an when the object's own reference count hits zero, those weak references are changed to undef
.
As this relates to hash entries, the value of the entry becomes undef
, but the entry itself is not deleted.
use Scalar::Util qw(weaken);
my %hash;
my $object = \do { my $o };
weaken($hash{some_key} = $object);
undef $object;
#The following will be true:
exists$hash{some_key} && $hash{some_key} == undef;
When iterating over the hash keys, one must then check to see if the value is undefined - which is often a messy solution.
If the hash's values are constantly being changed and updated, but not frequently iterated through, this can cause a significant memory leak.
Additionally, weakref semantics are cumbersome to deal with in pure perl. The weakness of a reference only applies to that specific variable; therefore, the general semantics of my $foo = $bar
, where $foo is understood to be an exact replica of $bar
are broken. If $bar
is a weak reference, $foo
does not retain such a property, and will increase the reference count of whatever foo and bar refer to.
Dealing with collection items, especially tied hashes and arrays becomes even more cumbersome. In some perls, doing the following with a tied hash will not work:
weaken($foo);
$tied_hash{some_key} = $foo;
# $foo is copied to the tied hash's STORE method.
This bug has since been fixed, but not everyone has the luxury of using the newest and shiniest Perl.
Other modules and their featuresets
In the event that weak reference keys are needed, there are several modules that implement solutions:
Tie::RefHash::Weak is a pure-perl tied hash which maintains weak keys. However it will not necessarily do the same for values, even when weaken
is used explicitly, because of aforementioned bugs. It is also significantly slow due to its TIE
interface.
Hash::Util::FieldHash is part of core since perl 5.10, and depends on something known as Hash uvar
magic, a new feature introduced in 5.10. It does not suffer from the slowness of Tie::RefHash::Weak, and allows you to (manually) create weak values. However, keys used for FieldHash
are permanently associated with the hash they have been keyed to, even if the entry or hash have been deleted.
Hash::FieldHash attempts to eliminate the version dependencies and caveats of Hash::Util::FieldHash
, but restricts the keys to only being references.
Ref::Store's featureset
Ref::Store
supports all features in the above mentioned modules, and the following
- By-Value garbage collection
-
When a value is stored as a weak reference (see the next section) and itsreference count hits zero, then the entire hash entry is deleted; the table does not collect stale entries
- By-Value deletion
-
Unlike normal hashes,
Ref::Store
can quickly and efficiently delete a value and all its dependent keys given just the value. No need to remember any lookup key under which the value has been stored - Multi-Value keys (Attributes)
-
Traditional hashes only allow storing of a single value under a single key. For multi-value storage under a single key, one must fiddle with nested data structures.
Like single-value keys, attributes may be reference objects, and will be discarded from the hash when the last remaining value's reference count hits zero.
Additionally, the same reference attribute object can serve as a key for multiple independent value sets, allowing to logically decouple collections which are dependent on a single object.
- User defined retention policy
-
For each key and value, it is possible to define whether either the key, the value, or both should be strong or weak references.
Ref::Store
by default stores keys and values as weak references, but can be modified on a per-operation basis.Values can also be stored multiple times under multiple keys with different retention policies: maintain a single 'primary' index, and multiple 'secondary' indices.
- Speed
-
Most of the features are implemented in C, some because of speed, and others because there was no pure-perl way to do it (See the
PP
backend, which is fairly buggy) - Works on perl 5.8
-
This has been tested and developed for perl 5.8.8 (which is the perl provided in el5).
Here are some caveats
- No Hash Syntax
-
Ref::Store
is a proper object. You cannot use the table as an actual hash (though it would be easy enough to wrap it in thetie
interface, it would be signficantly slower, and suffer from other problems related totie
) - Slow/Untested thread cloning
-
While attempts have been made to make this module thread-safe, there are strange messages about leaked scalars and unbalanced string tables when dealing with threads.
- Values Restricted to References
-
Values themselves must be reference objects. It's easy enough, however, to do
$table->store("foo", \"foo"); ${$table->fetch("foo")} eq "foo";
- Slower than
Hash::Util::FieldHash
-
This module is about 40% slower than
Hash::Util::FieldHash
- Must use Attribute API for multi-entry lookup objects
-
If you wish to use the same key twice, the key must be used with the Attribute API, and not the (faster) key API
API
LOOKUP TYPES
There are three common lookup types by which values can be indexed and mapped to.
A Lookup Type is just an identifier by which one can fetch and store a value. The uniqueness of identifiers is dependent on the lookup type. Performance for various lookup types varies.
Each lookup type has a small tag by which API functions pertaining to it can be identified
- Value-specific operations
-
These functions take a value as their argument, and work regardless of the lookup type
- Simple Key (SK)
-
This is the quickest and simplest key type. It can use either string or object keys. It support. The functions it supports are
- store($key, $value, %options)
-
Store
$value
under lookup <$key>. Key can be an object reference or string.A single value can be stored under multiple keys, but a single key can only be linked to a single value.
Options are two possible hash options:
- StrongKey
-
If the key is an object reference, by default it will be weakened in the databse, and when the last reference outside the database is destroyed, an implicit "unlink" will be called on it. Setting
StrongKey
to true will disable this behavior and not weaken the key object.A strong key is still deleted if its underlying value gets deleted
- StrongValue
-
By default the value is weakened before it is inserted into the database, and when the last external reference is destroyed, an implicit "purge" is performed. Setting this to true will disable this behavior and not weaken the value object.
It is important to note the various rules and behaviors with key and value storage options.
There are two conditions under which an entry (key and value) may be deleted from the table. The first condition is when a key or value is a reference type, and its referrent goes out of scope; the second is when either a key or a value is explicitly deleted from the table.
It is helpful to think of entries as a miniature version of implicit reference counting. Each key represents an inherent increment in the value's reference count, and each key has a reference count of one, represented by the amount of values it actually stores.
Based on that principle, when either a key or a value is forced to leave the table (either explicitly, or because its referrant has gone out of scope), its dependent objects decrease in their table-based implicit references.
Consider the simple case of implicit deletion:
{ my $key = "string": my $value = \my $foo $table->store($key, $foo); }
In which case, the string
"string"
is deleted from the table as $foo goes out of scope.The following is slightly more complex
my $value = \my $foo; { my $key = \my $baz; $table->store($key, $value, StrongValue => 1); }
In this case,
$value
is removed from the table, because its key object's referrant ($baz
) has gone out of scope. Even thoughStrongValue
was specified, the value is not deleted because its own referrant ($foo
) has been destroyed, but rather because its table-implicit reference count has gone down to 0 with the destruction of$baz
The following represents an inverse of the previous block
my $key = \my $baz; { my $value = \my $foo; $table->store($key, $value, StrongKey => 1); }
Here
$value
is removed from the table because naturally, its referrant,$foo
has been destroyed.StrongKey
only maintains an extra perl reference to$baz
.However, by specifying both
StrongKey
andStrongValue
, we are able to completely disable garbage collection, and nothing gets deleted{ my $key = \my $baz; my $value = \my $foo; $table->store($key, $value, StrongKey => 1, StrongValue => 1); }
This method is also available as
store_sk
.It is an error to call this method twice on the same lookup <-> value specification.
- fetch($key)
-
Returns the value object indexed under
$key
, if any. Also available underfetch_sk
- lexists($key)
-
Returns true if
$key
exists in the database. Also available aslexists_sk
- unlink($key)
-
Removes
$key
from the database. If$key
is linked to a value, and that value has no other keys linked to it, then the value will also be deleted from the databse. Also available asunlink_sk
$table->store("key1", $foo); $table->store("key2", $foo); $table->store("key3", $bar); $table->unlink("key1"); # $foo is not deleted because it exists under "key2" $table->unlink("key3"); # $bar is deleted because it has no remaining lookups
- purgeby($key)
-
If
$key
is linked to a value, then that value is removed from the database via "purge". Also available aspurgeby_sk
.These two blocks are equivalent:
# 1 my $v = $table->fetch($k); $table->purge($v); # 2 $table->purgeby($k);
- Typed Keys
-
Typed keys are like simple keys, but with more flexibility. Whereas a simple key can only store associate any string with a specific value, typed keys allow for associating the same string key with different values, so long as the type is different. A scenario when this is useful is associating IDs received from different libraries, which may be identical, to different values.
For instance:
use Library1; use Library1; my $hash = Ref::Store->new(); $hash->register_kt('l1_key'); $hash->register_kt('l2_key'); #later on.. my $l1_value = Library1->get_handle(); my $l2_value = Library2->get_handle(); #assume that this is possible: $l1_value->ID == $l2_value->ID(); $hash->store_kt($l1_value->ID(), 'l1_key', $l1_value); $hash->store_kt($l2_value->ID(), 'l2_key', $l2_value);
Note that this will only actually work for string keys. Object keys can still only be unique to a single value at a time.
All functions described for "Simple Keys" are identical to those available for typed keys, except that the
$key
argument is transformed into two arguments;thus:
store_kt($key, $type, $value); fetch_kt($key, $type);
and so on.
In addition, there is a function which must be used to register key types:
- Attributes
-
Whereas keys map value objects according to their identities, attributes map objects according to arbitrary properties or user defined tags. Hence an attribute allows for a one-to-many relationship between a lookup index and its corresponding value.
The common lookup API still applies. Attributes must be typed, and therefore all attribute functions must have a type as their second argument.
A suffix of
_a
is appended to all API functions. In addition, the following differences in behavior and options exist- store_a($attr, $type, $value, %options)
-
Like "store", but option hash takes a
StrongAttr
option instead of aStrongKey
option, which is the same. Attributes will be weakened for all associated values ifStrongAttr
was not specified during any insertion operation. - fetch_a($attr, $type)
-
Fetch function returns an array of values, and not a single value.
thus:
my $value = $hash->fetch($key); #but my @values = $hash->fetch_a($attr,$type);
However, storing an attribute is done only one value at a time.
- dissoc_a($attr, $type, $value)
-
Dissociates an attribute lookup from a single value. This function is special for attributes, where a single attribute can be tied to more than a single value.
- unlink_a($attr, $type)
-
Removes the attribtue from the database. Since multiple values can be tied to the same attribute, this can potentially remove many values from the DB. Be sure to use this function with caution
It is possible to use attributes as tags for boolean values or flags, though the process right now is somewhat tedious (eventually this API will be extended to allow less boilerplate)
use constant ATTR_FREE => "attr_free"; use constant ATTR_BUSY => "attr_busy"; $hash->register_kt(ATTR_FREE); $hash->register_kt(ATTR_BUSY); $hash->store_a(1, ATTR_FREE, $value); #Value is now tagged as 'free'; #to mark the value as busy, be sure to inclusively mark the busy tag first, #and then remove the 'free' mark. otherwise the value will be seen as destroyed #and associated references removed: $hash->store_a(1, ATTR_BUSY, $value); $hash->dissoc_a(1, ATTR_FREE, $value); #mark as free again: $hash->store_a(1, ATTR_FREE, $value); $hash->dissoc_a(1, ATTR_BUSY, $value);
The complexities come from dealing with a triadic value for a tag. A tag for a value can either be true, false, or unset. so
0, ATTR_FREE
is valid as well.
CONSTRUCTION
- new(%options)
-
Creates a new Ref::Store object. It takes a hash of options:
- keyfunc
-
only in PP backend
This function is responsible for converting a key to something 'unique'. The default implementation checks to see whether the key is a reference, and if so uses its address, otherwise it uses the stringified value. It takes the user key as its argument
Ref::Store will try and select the best implementation (
Ref::Store::XS
andRef::Store::PP
, in that order). You can override this by seting$Ref::Store::SelectedImpl
to a package of your choosing (which must be loaded).
ITERATION
While Ref::Store
is not a simple collection object, you can still iterate over values using the following formats
#Like keys %hash:
my @keyspecs = $refstore->klist();
my @values = $refstore->vlist();
#Like each %hash;
$refstore->iterinit(); #Initialize the iterator
while ( my ($lookup_type, $lookup_prefix, $my_key, $my_value) = $refstore->iter )
{
#do stuff.. but don't modify the list!
}
#Cleanup internal iterator state
$refstore->iterdone();
Because of the various types of keys which can be stored, all iteration functions return a 'key specification' - a list of three elements:
- Lookup Type
-
This is one of
REF_STORE_KEY
for key lookups, andREF_STORE_ATTRIBUTE
for attribute lookups (depending on whether this was stored with store or store_a).The type is necessary to determine which lookup function to call in the event that a more detailed operation is needed.
- Prefix/Type
-
For store functions which use a type (store_kt and store_a), this is the type. Useful to determine the parameter to call for further operations
- User Key
-
This is the actual key which was passed in to the store function. This can be a string or a reference.
A simpler API which copies its return values in lists are provided:
- vlist()
-
Returns a list of all value objects. This list is a copy.
- klist()
-
Returns a list of arrayrefs containing key specifications
- iterinit()
-
Initialize the internal iterator state. This must be called each time before anything else happens. Like perl hashes,
Ref::Store
maintains a global iterator state; callingiterinit
resets it.It is an error to initialize a possibly active iterator without explicitly destroying the previous one. Also, modifying the table during iteration could lead to serious problems. Unlike perl hashes, it is not safe to delete the currently iterated-over element; this is because the iteration is emulated and gathered from multiple hash states, and deletion of one element can trigger subsequent deletions.
If you happen to clobber an old iterator, this module will throw a warning. To avoid accidentally doing so, call "iterdone" after you have finished iterating.
- iter()
-
Compare to each. Returns a 4-element list, the first three of which are the key specification, and the last is a value specification. For attribute lookups, this is an arrayref of values indexed under the attribute, and for key lookups this is the sole value.
Once this function returns the empty list, there are no more pairs left to traverse.
- iterdone()
-
Call this function if you break out of a loop, or to explicitly de-initialize the iterator state.
A more complex iterator API is provided. Unlike the simpler API, this does not copy its return values, thus saving memory
DEBUGGING/INFORMATIONAL
Often it is helpful to know what the table is holding and indexing, possibly because there is a bug or because you have forgotten to delete something.
The following functions are available for debugging
- vexists($value)
-
Returns true if
$value
exists in the database. The database internally maintains a hash of values. When functioning properly, a value should never exist without a key lookup, but this is still alpha software - vlookups($value)
-
not yet implemented
Returns an array of stringified lookups for which this value is registered
- lexists(K)
-
Returns true if the lookup
K
exists. See the "API" section for lookup-specific parameters forK
- is_empty
-
Returns true if there are no lookups and no values in the database
- dump
-
Prints a tree-like representation of the database. This will recurse the entire database and print information about all values and all lookup types. In addition, for object references, it will print the reference address in decimal and hexadecimal, the actual SV address of the reference, and whether the reference is a weak reference.
THREAD SAFETY
Ref::Store
is tested as being threadsafe the XS backend.
Due to tricky limitations and ninja-style coding necessary to properly facilitate threads, thread-safety is not supported in the PP backend, though YMMV (on my machine, PP thread tests segfault).
Thread safety is quite difficult since reference objects are keyed by their memory addresses, which change as those objects are duplicated.
USAGE APPLICATIONS
This module caters to the common, but very narrow scope of opaque perl references. It cares nothing about what kind of objects you are using as keys or values. It will never dereference your object or call any methods on it, thus the only requirement for an object is that it be a perl reference.
Using this module, it is possible to create arbitrarily complex, but completely leak free dependency and relationship graphs between objects.
Sometimes, expressing object lifetime dependencies in terms of encapsulation is not desirable. Circular references can happen, and the hierarchy can become increasingly complex and deep - not just logically, but also syntactically.
This becomes even more unwieldy when the same object has various sets of dependants.
A good example would be a socket server proxy, which accepts requests from clients, opens a second connection to a third-party server to process the client request, forwards the request to the client's intended origin server, and then finally relays the response back to the client.
For most simple applications there is no true need to have multiple dynamically associated and deleted object entries. The benefits of this module become apparent in design and ease of use when larger and more complex, event-oriented systems are in use.
In shorter terms, this module allows you to reliably use a Single Source Of Truth for your object lookups. There is no need to synchronize multiple lookup tables to ensure that there are no dangling references to an object you should have deleted
BUGS AND CAVEATS
Probably many.
The XS backend (which is also the default) is the more stable version. The pure- perl backend is more of a reference implementation which has come to lag behind its XS brother because of magical voodoo not possible in a higher level language like Perl.
Your system will need a working C compiler which speaks C99. I have not tested this on
It is not advisable to store one table within another. Doing so may cause strange things to happen. It would make more sense to have a table being module-global anyway, though.
Ref::Store
is really not a lightweight object.It is currently not possible to store key objects with prefixes/types. If you need prefixed object keys, use the attribute API.
On my system, making a circular link between two objects will not work properly in terms of garbage collection; this means the following
$foo = \do bless { my $o }; $bar = \do bless { my $o }; #both objects $rs->store($foo, $bar); $rs->store($bar, $foo);
Will not work on the PP backend. It is fully supported in the default XS backend, however.
The iteration and
klist
APIs are not yet implemented in the PP versionKeyfunc and friends are not fully implemented, either
AUTHOR
Copyright (C) 2011 by M. Nunberg
You may use and distribute this program under the same terms as perl itself
SEE ALSO
- Hash::Util::FieldHash
-
Ref::Store implements a superset of Hash::Util::FieldHash, but the latter is most likely quicker. However, it will only work with perls newer than 5.10
- Tie::RefHash::Weak
- Variable::Magic
-
Perl API for magic interface, used by the
PP
backend - KiokuDB, Tangram, Pixie
2 POD Errors
The following errors were encountered while parsing the POD:
- Around line 1485:
You forgot a '=back' before '=head2'
You forgot a '=back' before '=head2'
- Around line 1617:
You forgot a '=back' before '=head1'
You forgot a '=back' before '=head1'