NAME
Genezzo::Row::RSTab.pm - Row Source TABle tied hash class.
SYNOPSIS
use Genezzo::Row::RSTab;
# see Tablespace.pm -- implementation and usage is tightly tied
# to genezzo engine...
# make a factory for rsfile
my $fac2 = make_fac2('Genezzo::Row::RSFile');
my %args = (
factory => $fac2,
# need tablename, bufcache, etc...
tablename => ...
tso => ...
bufcache => ...
);
my %td_hash;
$tie_val =
tie %td_hash, 'Genezzo::Row::RSTab', %args;
# pushhash style
my @rowarr = ("this is a test", "and this is too");
my $newkey = $tie_val->HPush(\@rowarr);
@rowarr = ("update this entry", "and this is too");
$tied_hash{$newkey} = \@rowarr;
my $getcount = $tie_val->HCount();
DESCRIPTION
RSTab is a hierarchical pushhash (see Genezzo::PushHash::hph) class that stores perl arrays as rows in a table, writing them into a block (byte buffer) via Genezzo::Row::RSFile and Genezzo::Block::RDBlock.
ARGUMENTS
- tablename (Required) - the name of the table
- tso (Required) - tablespace object from Genezzo::Tablespace
- bufcache (Required) - buffer cache object from Genezzo::BufCa::BCFile
CONCEPTS
Logically, a table is made of rows, and rows are vectors of columns. Physically (at least from an OS implementation viewpoint), a table is made up of blocks stored in files. The RSTab hierarchical pushhash (hph) uses an RSFile factory, though it could be constructed as an hph of arbitrary depth. The basic HPush mechanism takes an array, flattens it into a string, and pushes the string into one of the underlying blocks.
While the RSTab api is primarily intended as a row-based interface, it has some extensions to directly manipulate the underlying blocks. These extensions are useful for building specialized index mechanisms (see Genezzo::Index) like B-trees, or for supporting rows that span multiple blocks.
Basic PushHash
You can use RSTab as a persistent hash of arrays of scalars if you like. The arrays and scalars can be of arbitrary length (as long as they fit in your datafiles).
SQL DBI-style interface
RSTab is designed to efficiently support prepare/execute/fetch operations against tables. What distinguishes this API from a standard hash is that the "prepare" operation generates a custom, stateful iterator that understands filters and range selection. A filter is simply a predicate which is applied to every row -- rows which pass are returned to the caller, and rows which fail are "filtered out". Range selection is somewhat similar, with the notion of start and stop keys -- the iterator only returns the rows which are restricted to a certain range of values. In general, range selection is driven off a separate indexing mechanism that positions the fetch to specifically retrieve the range in an efficient manner, versus fetching all rows and filtering rows outside the range.
HPHRowBlk - Row and Block operations
HPHRowBlk is a special pushhash subclass with certain direct block manipulation methods. One very useful function is HSuck, which provides support for rows that span multiple blocks. While the standard HPush fails if a row exceeds the space in a single block, the HSuck api lets the underlying blocks consume the rows in pieces -- each block "sucks up" as much of the row as it can. The RSTab HPush is re-implemented on top of HSuck to support large rows.
Counting, Estimation, Approximation
RSTab has some support for count estimation, inspired by some of Peter Haas' work (Sequential Sampling Procedures for Query Size Estimation, ACM SIGMOD 1992, Online Aggregation (with J. Hellerstein and H. Wang), ACM SIGMOD 1997 Ripple Joins for Online Aggregation (with J. Hellerstein) ACM SIGMOD 1999). It could use support for confidence intervals, so drop me a line if you understand Central Limit Theorem, Hoeffding and Chebyshev inequalites. Knowledge of change-points and time-series is also a plus.
FUNCTIONS
RSTab support all standard hph hierarchical pushhash operations, with the extension that it manipulates arrays of scalars, not individual scalars.
EXPORT
LIMITATIONS
various
TODO
- rownum filter support to move to separate package
- $href: remove - need a dict function to return allfileused via tso
- HSuck: need a way to specify packing method
- HSuck: fix trailing zero replacement
- NextCount: fix quitloop
- localPush/Store: qualify length packstr as percentage of blocksize (1/3?)
- localStore: race condition on rowstat
- localFetchDelete: frag flag info, delete status. Could express this function as a generalized "RowSplice" (as distinct from RDBlkA::HSplice, which is a block splice operator). Would need be able to splice based upon column number/array offset, as well as substring byte offset -- the inverse functionality of PackRow2/HSuck
- DBI - support Bind and projection (returning only certain specified columns, versus all columns)
- _init: change to use TSTableAFU support versus href->{filesused}
- need support for constraints that "mutate" supplied values, e.g. manipulate numeric precision or supply default values for columns. Also need support for foreign keys in delete.
AUTHOR
Jeffrey I. Cohen, jcohen@genezzo.com
SEE ALSO
Genezzo::PushHash::HPHRowBlk, Genezzo::PushHash::hph, Genezzo::PushHash::PushHash, Genezzo::Tablespace, Genezzo::Row::RSFile, Genezzo::Row::RSBlock, Genezzo::Block::RDBlock, Genezzo::BufCa::BCFile, Genezzo::BufCa::BufCaElt, perl(1).
Copyright (c) 2003, 2004, 2005 Jeffrey I Cohen. All rights reserved.
This program is free software; you can redistribute it and/or modify
it under the terms of the GNU General Public License as published by
the Free Software Foundation; either version 2 of the License, or
any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU General Public License for more details.
You should have received a copy of the GNU General Public License
along with this program; if not, write to the Free Software
Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
Address bug reports and comments to: jcohen@genezzo.com
For more information, please visit the Genezzo homepage at http://www.genezzo.com