NAME
Genezzo::Block::RDBlock.pm - Row Directory Block tied hash class. A class that lets you treat the contents of a block (byte buffer) as a hash.
Note: This implementation is almost, but not quite, a pushhash. The push hash implementation is Genezzo::Row::RSBlock. It also forms the basis of a tied array in Genezzo::Block::RDBArray.
SYNOPSIS
use Genezzo::Block::RDBlock;
use Genezzo::Block::Std;
local $Genezzo::Block::Std::DEFBLOCKSIZE = 500;
my $buff = "\0" x 500; # construct an empty byte buffer
my %tied_hash = ();
my $tie_val =
tie %tied_hash, 'Genezzo::Block::RDBlock', (refbufstr => \$buff);
# pushhash style
# (note that the "PUSH" pseudo key is not supported)...
my $newkey = $tie_val->HPush("this is a test");
# or array style, your choice
my $pushcount = $tie_val->PUSH(qw(push lots of data));
$tied_hash{$newkey} = "update this entry";
# a hash that supports array style FETCHSIZE
my $getcount = $tie_val->FETCHSIZE(); # Note: not HCount
DESCRIPTION
RDBlock is the basis for persistent tied hashes, pushhashes, and tied arrays. After the hash is tied to the byte buffer, the buffer can be written to persistent storage. The storage is designed such that inserts/appends/pushes are fairly efficient, and deletes are inexpensive. The pctfree/pctused parameters allow some tuning to reserve space in the buffer for updates that "grow" existing values. Updates that do not change the packed size of data are about as efficient as insert/appends -- just the cost to copy your bytes into the buffer -- but updates that do change the size of stored values can require a large amount of byte shifting to open up storage space. Also, the buffer does not grow to accomodate large values. Wrapper classes are necessary to specify mechanisms for packing complex data structures and techniques to split objects across multiple buffers.
ARGUMENTS
- refbufstr (Required) - a reference to the byte buffer used for storage.
- blocksize (Optional) - the size of the supplied byte buffer. Default is $Genezzo::Block::Std::DEFBLOCKSIZE.
- pctfree (Optional) - the percentage of space kept free for future updates. Default is 30 (percent).
- pctused (Optional) - after the block is full, the percentage of space that must be open before inserts are re-enabled. Default is 50 (percent).
CONCEPTS
The structure and techniques for the Row Directory Block are described in Chapter 14, "The Tuple-Oriented File System", of "Transaction Processing: Concepts and Techniques" by Jim Gray and Andreas Reuter, 1993.
A tuple is a collection of values -- in the standard vernacular you would call it a "row" in a database. The refbufstr argument to the hash constructor is a "block", a fixed-size contiguous buffer of bytes. When you write (STORE
) a value into the RDBlock hash, it writes an entry into the block as a byte string, and reads (FETCH
) work in an analogous fashion.
The RDBlock data structures refer to stored values as "rows", but the basic STORE
and FETCH
only understand how to store and retrieve individual byte strings. Wrapper classes for RDBlock must marshall/unmarshall (Freeze/Thaw) between simple strings and more complex data structures.
The block has some header and footer information, plus a row directory, a data structure that records the offsets, extents, and status information of the stored row data. While the physical location of row data in a block may change as other rows are added, deleted or modified, the row keeps the same hash key.
Each row has an associated "status" bitfield, which is some combination of the following values:
- deleted
-
set if row is deleted. Deleted rows are simply marked as deleted, but the physical storaged is not immediately recouped.
- data
-
set for data rows, unset for metadata. All information stored via the standard public interfaces is data. You can manipulate the private interfaces to store "metadata", additional rows that describe, for example, block contents, transaction information, or data relationships, but are invisible to the public interfaces. By convention, row 0 is always a metadata row.
- lock
-
set if row is locked. Not used in this base class -- provided for subclasses that must supply and maintain the appropriate metadata to identify locker and transaction information.
- head
-
set if the stored value is the very first part of a row.
- tail
-
set if the stored value is the very last part of a row. If
STORE
writes a complete value it sets both head and tail to true. The base class only writes rows that fit in a single block, so both head and tail are always set.These flags are useful if you wish to write subclasses with rows that span multiple blocks. Neither head nor tail is set if only the middle section of a multi-part row is stored.
- isnull
-
If you supply
STORE
with a value of undef, it writes a marker for a zero-length string and sets this flag.FETCH
will correctly return an undef.Note: When packing more complex data structures, make sure to use an encoding that distinguishes between undefs and zero-length strings. A simple scheme for packing an array of strings is to prefix the packed array with a bitstring that specifies which entries are null.
FUNCTIONS
RDBlock support all standard hash operations, with the exception that you cannot create or insert a user key -- you must push new entries and use the generated key or basic iteration to retrieve your data.
It also supports three additional public methods: an array style PUSH
and FETCHSIZE
, plus a PushHash style HPush. Note that these methods are associated with the tie value (i.e. the blessed ref for the RDBlock class), not the tied hash. Finally, it has five "private" methods that may be of use in constructing subclasses: push_one, packdeleted, offset2hkey, lastkey, prevkey
- PUSH this, LIST
-
PUSH appends the list to the end of the hash and returns the number of items it pushed.
- FETCHSIZE
-
Returns the total number of valid, undeleted data items in the hash.
- HPush
-
HPush returns the new key for each pushed value. It only accepts a single argument, not a list.
my $newkey = $tie_val->HPush("this is a test");
Note that there is not a corresponding "pop" operation.
EXPORT
DATAROW, RowStat
LIMITATIONS
The storage mechanism uses network longs (32 bits?) to describe the lengths of rows and offsets within the block. (That seems pretty large -- maybe it should use shorts to restrict blocksize and row piece length to 64K? Or init should take an optional module name for block type that lets us vary the row directory, header and footer sizing).
TODO
- use row directory rowlen vs len/value for row storage
- meta row - should binary search for meta id
- unicode support
AUTHOR
Jeffrey I. Cohen, jcohen@genezzo.com
SEE ALSO
Copyright (c) 2003-2007 Jeffrey I Cohen. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
Address bug reports and comments to: jcohen@genezzo.com
For more information, please visit the Genezzo homepage at http://www.genezzo.com