NAME

Paranoid::BerkeleyDB -- BerkeleyDB CDS Object

VERSION

$Id: BerkeleyDB.pm,v 0.85 2011/12/08 07:30:26 acorliss Exp $

SYNOPSIS

use Paranoid::BerkeleyDB;

$db = Paranoid::BerkeleyDB->new(DbDir => '/tmp', DbName => 'foo.db', 
                                DbMode => 0640);
$rv = $db->addDb($dbname);

$val = $db->getVal($key);
$val = $db->getVal($key, $dbname);

$rv = $db->setVal($key, $val);
$rv = $db->setVal($key, $val, $dbname);

@keys = $db->getKeys();
@keys = $db->getKeys($dbname);
@keys = $db->getKeys(undef, \&sub);
@keys = $db->getKeys($dbname, \&sub);

$db->purgeDb();
$db->purgeDb($dbname);

@dbs = $db->listDbs();

$db->cds_lock;
$db->cds_unlock;

# Close environment & databases
$db = undef;

DESCRIPTION

This provides a OO-based wrapper for BerkeleyDB that creates Concurrent Data Store (CDS) databases. This is a feature of Berkeley DB v3.x and higher that provides for concurrent use of Berkeley DBs. It provides for multiple reader, single writer locking, and multiple databases can share the same environment.

This module hides much of the complexity of the API (as provided by the BerkeleyDB(3) module. Conversely, it also severely limits the options and flexibility of the module and libraries as well. In short, if you want a quick and easy way for local processes to have concurrent access to Berkeley DBs without learning bdb internals, this is your module. If you want full access to all of the bdb features and tuning/scalability features, you'd better learn dbd.

One particulary nice feature of this module, however, is that it's fork-safe. That means you can open a CDS db in a parent process, fork, and continue r/w operations without fear of corruption or lock contention due to stale filehandles.

lock and unlock methods are also provided to allow mass changes as an atomic operation. Since the environment is always created with a single global write lock (regardless of how many databases exist within the environment) operations can be made on multiple databases.

SUBROUTINES/METHODS

new

$db = Paranoid::BerkeleyDB->new(DbDir => '/tmp', DbName => 'foo.db');

This class method is the object instantiator. Two arguments are required: DbDir which is the path to the directory where the database files will be stored, and DbName which is the filename of the database itself. If DbDir doesn't exist it will be created for you automatically.

DbMode is optional, and if omitted defaults to 0700. This affects the database directory, files, and lockfile.

This method will create a BerkeleyDB Environment and will support multiprocess transactions.

Any errors in the operation will be stored in Paranoid::ERROR.

addDb

$rv = $db->addDb($dbname);

This method adds another database to the current object and environment. Calling this method does require an exclusive write lock to the database to prevent race conditions.

Any errors in the operation will be stored in Paranoid::ERROR.

getVal

$val = $db->getVal($key);
$val = $db->getVal($key, $dbname);

This method retrieves the associated string to the passed key. Called with one argument the method uses the default database. Otherwise, a second argument specifying the specific database is required.

Requesting a non-existent key or from a nonexistent database will result in an undef being returned. In the case of the latter an error message will also be set in Paranoid::ERROR.

setVal

$rv = $db->setVal($key, $val);
$rv = $db->setVal($key, $val, $dbname);

This method adds or updates an associative pair. If the passed value is undef the key is deleted from the database. If no database is explicitly named it is assumed that the default database is the one to work on.

Requesting a non-existent key or from a nonexistent database will result in an undef being returned. In the case of the latter an error message will also be set in Paranoid::ERROR.

getKeys

@keys = $db->getKeys();
@keys = $db->getKeys($dbname);
@keys = $db->getKeys(undef, \&sub);
@keys = $db->getKeys($dbname, \&sub);

If this method is called without the optional subroutine reference it will return all the keys in the hash in hash order. If a subroutine reference is called it will be called as each key/value pair is iterated over with three arguments:

&$subRef($dbObj, $key, $value);

with $dbObj being a handle to the current database object. You may use this ref to make changes to the database. Anytime a code reference is handed to this method it is automatically opened with a write lock under the assumption that this might be a transformative operation.

purgeDb

$db->purgeDb();
$db->purgeDb($dbname);

This method purges all associative pairs from the designated database. If no database name was passed then the default database will be used. This method returns the number of records purged, or a -1 if an invalid database was requested.

listDbs

@dbs = $db->listDbs();

This method returns a list of databases accessible by this object.

cds_lock

$db->cds_lock;

This method places a global write lock on the shared database environment. Since environments are created with a global lock (covering all databases in the environment) no writes or reads can be done by other processes until this is unlocked.

cds_unlock

$db->cds_unlock;

This method removes a global write lock on the shared database environment.

DESTROY

A DESTROY method is provided which should sync and close an open database, as well as release any locks.

DEPENDENCIES

o

Paranoid

o

Paranoid::Debug

o

Paranoid::Filesystem

o

Paranoid::Lockfile

o

BerkeleyDB

BUGS AND LIMITATIONS

Race conditions, particularly on database creation/opens, are worked around by the use of external lock files and flock advisory file locks. Lockfiles are not used during normal operations on the database.

While CDS allows for safe concurrent use of database files, it makes no allowances for recovery from stale locks. If a process exits badly and fails to release a write lock (which causes all other process operations to block indefinitely) you have to intervene manually. The brute force intervention would mean killing all accessing processes and deleting the environment files (files in the same directory call __db.*). Those will be recreated by the next process to access them.

Berkeley DB provides a handy CLI utility called db_stat(1). It can provide some statistics on your shared database environment via invocation like so:

db_stat -m -h .

The last argument, of course, is the directory in which the environment was created. The example above would work fine if your working directory was that directory.

You can also show all existing locks via:

db_stat -N -Co -h .

SEE ALSO

L<BerkeleyDB(3)>

HISTORY

2011/12/06: Added fork-safe operation

AUTHOR

Arthur Corliss (corliss@digitalmages.com)

LICENSE AND COPYRIGHT

This software is licensed under the same terms as Perl, itself. Please see http://dev.perl.org/licenses/ for more information.

(c) 2005, Arthur Corliss (corliss@digitalmages.com)