NAME
ObjectDBI - Perl Object Persistence in an RDBMS using DBI
SYNOPSIS
Serializes, queries, unserializes and deletes perl objects in an SQL driven, DBI accessible RDBMS.
MODULE
Created to use RDBMS-es as perl object storage, even when very primitive RDBMS-es are involved. The advantage is portability of your project (you don't have to transport a dbm or flat files with your project that is also RDBMS-based), a certain degree of discoverability (you can use plain SQL yourself to see what's inside the database), and searchability (metadata and data don't get equated). You must create a table for this storage, and (an idea for) the SQL for that is given below:
SQL
PostgreSQL or Oracle:
create sequence perlobjectseq;
create table "perlobjects" (
obj_id integer unique not null,
obj_pid integer references perlobjects (obj_id),
obj_gpid integer references perlobjects (obj_id),
obj_name varchar(255),
obj_type varchar(64),
obj_value varchar(255)
);
MySQL:
create table perlobjects (
obj_id integer unique not null auto_increment,
obj_pid integer references perlobjects (obj_id),
obj_gpid integer references perlobjects (obj_id),
obj_name varchar(255),
obj_type varchar(64),
obj_value varchar(255)
);
Indexes:
create index perlobjects_name_i on perlobjects (obj_name);
create index perlobjects_type_i on perlobjects (obj_type);
create index perlobjects_value_i on perlobjects (obj_value);
create index perlobjects_pid_i on perlobjects (obj_pid);
create index perlobjects_gpid_i on perlobjects (obj_gpid);
Now before y'all start shouting; obviously, given your particular type of RDBMS, your mileage may vary with respect to this SQL code, and you may not have primary or foreign keys. You may not have indexes or sequences, and you may even have an easier way to store infinite strings. This is all up to you, your cleverness and your needs.
If you plan to store perlhashes with keys of more than 255 character length (which is an unwise thing in itself), for example, then you might consider making 'obj_name' a bit longer. If you plan to store values with characters outside of the 32-126 range and you're using Postgres, then you might want to change the data type of 'obj_value' from 'varchar' to 'bytea'. This module isn't here to lecture you - just to make things easy.
If you're using MySQL, you'll have problems without a sequence, so you'll have to make the 'obj_id' field auto-incrementing. For those users, a special piece of code is added to withdraw the id of an object after the fact of its insertion.
If you're using a RDBMS that doesn't do sequences OR auto-incrementing, then IDs are generated out of thin air. Be prepared to work with large numbers though. If your RDBMS can't handle those - well, then I'm at my wit's end: please provide a 'sequencefnc' to the constructor.
API
my $objectdbi = ObjectDBI->new (%options)
Returns a blessed instance of this module. The arguments provide the object with a hash of options, which can be:
'dbh' => DBI database handle.
'dbiuri' => DBI database connection URI.
'dbiuser' => DBI database connection user.
'dbipass' => DBI database connection password.
'dbioptions' => DBI database connection options.
'table' => Table name used ('perlobjects' is the default).
'sequence' => Sequence name for easily retrieving new IDs.
'sequencesql' => Sequence SQL for retrieving a new ID.
'sequencefnc' => A function ref to be used to retrieve a new ID.
'overwrite' => Overwrite objects of the same type and name.
'chunksize' => A number defining at what length values will get split.
'debug' => Setting it at anything will make STDERR a busy stream.
About sequences: the first available method given will be used. So please do yourself a favour, avoid confusion, and use only one of the available methods out of 'sequence', 'sequencesql' and 'sequencefnc'.
About chunksize: the default value is 255. If you set it to zero, that'll be interpreted as 'infinite'. If you set it to anything else, make sure it matches the storage size of the 'obj_value' field in the RDBMS.
Also about both sequences and chunksize: in the case of postgres and oracle, autodiscovery of these items will be performed in case they're not given.
my $id = $objectdbi->put ($ref[,$name[,$overwrite]])
Store a reference in the database, perhaps under a certain name. If 'overwrite' is set (either in the object or as a parameter), and the object with given type and name already exists, it is removed prior to this object being written. Returns the ID of the object of the newly created object.
my @ids = $objectdbi->find ([$type],[$name],[$value])
Find IDs for objects that match type and/or name and/or value.
my $ref = $objectdbi->get ($id) or $objectdbi->get ($type, $name)
Returns the fully deserialized object with the given ID, or find the first object that matches type and name.
my ($type, $name, $id) = $objectdbi->get_meta ($id)
Returns an array of type and name for an object with given ID.
my @refs = $objectdbi->get_all (@ids)
Auxillary method. Returns an array or array reference of objects with the given IDs.
my @ids = $objectdbi->query ($querystring)
Queries the database with a specific query string. The syntax for this query 'language' is as follows:
expressions are separated by logical operators ('&&', '||') and round braces ('(' and ')') determine precedence.
expressions are made up of a path, and optionally an operator ('==', '!=') and a value.
a path is a series of elements, representing hash-keys or array-indexes separated by a forward slash ('/').
both paths and values may be enclosed in single or double quotes, so as to forego escaping of certain characters or whitespace.
an element can have wildcards (an asterisk ('*')), or be a wildcard in itself.
a back slash escapes all tokens, one character at a time.
outside of quoted strings and path elements, whitespace is ignored.
my @types = $objectdbi->get_types ()
Returns a distinct list of all object types known to the database.
my $n = $objectdbi->del ($id) or $objectdbi->del ($type, $name)
Deletes an object by the given ID, or deletes the first object which matches type and name. Returns zero or non zero depending on whether the operation failed or was successful, respectively.
my $n = $objectdbi->del_all (@ids)
Auxillary method. Deletes all objects with given IDs.
my $n = $objectdbi->count ([type], [query])
Returns the amount of objects that answer to these conditions.
my $dbh = $objectdbi->get_dbh ()
Returns the DBI database handle.
CURSORS
Cursors are there to obtain lists of objects in a 'streaming' (as opposed to 'buffered') fashion. When the list of objects is (potentially) too long to retrieve all at once, you'd use a cursor and iterate through it.
my $cursor = $objectdbi->cursor([query], [type]);
or
my $cursor = ObjectDBI::Cursor->new($objectdbi, [query], [type]);
Usage:
my $cursor = $objectdbi->cursor("foo");
while (my $ref = $cursor->next()) {
print Dumper($ref);
}
Bear in mind that the query given when initializing the cursor is optional. When no query is given, you simply iterate through the entire set of objects.
my $err = $cursor->skip($amount);
or
my $err = $cursor->skip_forward($amount);
Skips this many places in the cursor.
my $err = $cursor->skip_backward($amount);
Skips this many places in the cursor in the opposite direction.
my $err = $cursor->first();
Move to the first possible position of the cursor. Doesn't need to be called when your previous call is $cursor->new(), because a new cursor is always at its first possible position.
my $err = $cursor->last();
Move to the last possible position of the cursor.
my $ref = $cursor->next();
Move to the next position of the cursor. Returns the object at the current position.
my $ref = $cursor->prev();
Move to the previous position of the cursor. Returns the object at the current position.
my $ref = $cursor->current();
Returns the object at the current position.
my ($name, $type, $id) = $cursor->meta_data();
Returns the meta data (name, type, id) of the object at the current position.
my $ref = $cursor->del();
Deletes the current element under the cursor and moves to cursor one up. A small concurrency warning is in order here: when two conncurrent users use cursors, and one of them start to manipulate the state of the db, the other one *might* risk suddenly dropping out of it.
SAMPLE USAGE
Below are a few code examples demonstrating this package's usage. They assume various databases on localhost; please adapt according to your configuration.
Storing and Retrieving
use ObjectDBI;
use Data::Dumper;
my $ref = bless({ foo => 'bar' }, 'Foobar');
my $objectdbi = ObjectDBI->new(
dbiuri => 'DBI:Oracle:SID=MYSID;host=localhost',
dbiuser => 'user',
dbipass => 'pass'
) || die "Could not connect to db";
my $id = $objectdbi->put($ref, 'myref');
my $ref2 = $objectdbi->get($id);
print Dumper($ref2);
Using Queries
use ObjectDBI;
my $ref = { foo => { bar => 'foobar' }};
my $objectdbi = ObjectDBI->new(
dbiuri => 'DBI:Pg:dbname=mydb'
) || die "Could not connect to db";
$objectdbi->put($ref);
my @ids = $objectdbi->query("foo/bar=='foobar' || foo/*=='foo*'");
print @ids;
Seeing Cross Referencing in Action
use ObjectDBI;
use Data::Dumper;
my $objectdbi = ObjectDBI->new(
dbiuri => 'DBI:Pg:dbname=mydb'
) || die "Could not connect to db";
my $hash = { foo => [ 'bar' ] };
$hash->{'foobar'} = $hash->{foo};
print Dumper($hash);
my $id = $objectdbi->put($hash);
my $ref = $objectdbi->get($id);
print Dumper($ref);
NOTES
Blessing objects vs. loading modules
Bear in mind that when an object becomes blessed during deserialization, the module in question hasn't necessarily been loaded, and this module will not do it for you either (since it doesn't know where you store that code). So calling methods on a deserialized object may require you to do some additional module usage. Not loading a module and yet calling a method on a blessed reference of it, can lead to cryptic error messages.
ObjectDBI vs. Perl TIE
This module doesn't implement a perl TIE interface. There's Tie::DBI for that. You could probably re-implement Tie::DBI on top of this module, though.
ObjectDBI vs. Tangram
I didn't know Tangram existed when I made this module. Upon brief examination of Tangram, I think the differences between ObjectDBI and Tangram are as follows:
Tangram is huge. ObjectDBI is simpler (and more immature, probably).
Tangram is much more geared toward a Tangram-specific query language, while ObjectDBI is geared toward storing and searching by name and type. ObjectDBI does have a (limited) query language of its own, though.
Tangram stores objects as a whole, which requires potentially unlimited storage in a field. Not all RDBMS supply this feature.
Tangram requires you to specify what values of an object you want stored for searching. ObjectDBI has no such limitations and preserves the amorphousness that is inherent to the world of perl objects.
ObjectDBI database tables will be a lot bigger (as in: more rows) statistically than Tangram database tables (which will have bigger rows).
Transactions
Transactions could be implemented as follows:
my $id;
$objectdbi->get_dbh()->begin_work();
if ($id = $objectdbi->put($ref)) {
$objectdbi->get_dbh()->commit();
} else {
$objectdbi->get_dbh()->rollback();
}
DEPENDENCIES
DBI => 1.3
BUGS
People using this library with MySQL must extra alert for bugs: I don't and won't use it; yet I've written special code for it. More specifically, people using something other than Oracle or Postgres must be extra alert for bugs. Your feedback is appreciated.
When storing long values, the breaking up of them into pieces impairs search capabilities; fragments that you're looking for might have been broken up.
TODO
Break my head over that breaking up of long values. I can't just concatenate a random amount of strings to work with in SQL, so I'm thinking perhaps having a backend perl stored procedure. But that would do away with using this library in any other way than with postgres. Ah, I'm stuck with it.
CHANGELOG
This changelog is incomplete was only started in the transition from ObjectDBI-0.08 to ObjectDBI-0.09
0.09 Added queries to cursors. Fixed the sql logic in __tree_to_sql.
0.10 Added two tests - hey ! I know how to make tests now !
0.11 Skip those tests, unless you're running SQL::Statement and MLDBM, or a Postgres DB called perlobjects. This kind of defies the purpose of testing. Also, simplified the 'get object' query, so that more primitive databases can handle the SQL. Should take care of that for all queries. Strictly on demand though, although I can imagine that the kind of queries you can fabricate using the query parser, can be a bit much for any top-notch SQL optimizer.
0.12 Changed the test scripts at the behest of Slaven Rezic. I'm really sorry for wasting everyone's time with this, but the whole 'testing' thing is still a bit foreign to me.
0.13 Implemented debugging of SQL statements. A beginning is made. Also, adjusted test 2 to be self sufficient.
0.14 Lots more cursor functionality.
COLOFON
Written by KJ Hermans (kees@pink-frog.com) Sep 2010.