NAME
NoSQL::PL2SQL::Simple - Implementation of NoSQL::PL2SQL
SYNOPSIS
BEGIN {
package MyArbitraryClass ;
use base qw( NoSQL::PL2SQL::Simple ) ;
}
use CGI ;
my $collection = new MyArbitraryClass ;
## Writing to and modifying the database
$collection->save( CGI->new->Vars ) ;
$collection->delete( CGI->new->Vars->{recordid} ) ;
## Accessing the database
@allemails = values %{ { $collection->contactemail } } ;
$user = $collection->contactemail('jim@tqis.com')->record ;
@neighbors = $collection->contactaddress('Ann Arbor')->records ;
@classmates = $collection->query(
contactcity => 'Ann Arbor',
graduationyear => '1989',
)->records ;
NoSQL::PL2SQL::Simple VERSUS NoSQL::PL2SQL
NoSQL::PL2SQL performs background persistence for perl objects using SQL and a relational database: Persistence is enabled with a simple designation. Everything else happens automatically in the background. As a result, NoSQL::PL2SQL has practically no defined API. So it is inadequate for users who are looking for an alternative to SQL (as in NoSQL).
NoSQL::PL2SQL::Simple solves that problem and provides a complete API to access stored data. In effect, these two modules perform a division of labor: NoSQL::PL2SQL is responsible for storing the data, and NoSQL::PL2SQL::Simple is responsible for access.
The comparison is also dependent on the data architecture needs.
- Two-Dimensional Tabular Data
-
Given that RDB's call their data containers tables, the two-dimensional tabular data structure tends to dominate traditional data architecture. While a table is visually easy to comprehend. A more abstract model extends a one-dimensional array: Where, instead of scalars, each array element is a set of NVP's (name-value-pairs). Two-dimensional data is tabular if each NVP set shares a common set of names. Non-tabular two-dimensional data is still a tough fit for a conventional RDB.
- Multi-Dimensional Tabular Data
-
A multi-dimensional table is best described in terms of a spreadheet, where one cell contains a list (or any set). The description is trickier in terms of an array of NVP's. But these data types are getting more and more common as CSV data representation starts to give way to XML and JSON. RDB's can handle complex data by using relational tables. (the R in RDB.) But RDB data definitions tend to have scaling problem as the data gets more and complex.
- Non-Tabular Data
-
It's possible to cobble together an RDB solution for two-dimensional non-tabular data. But, beyond that, developers are entering a world of pain. In this realm, NoSQL::PL2SQL provides a clear advantage.
NoSQL::PL2SQL is designed for non-tabular data where traditional RDB's are not very useful. NoSQL::PL2SQL::Simple requires some form of tabular data, and combines the advantages of both NoSQL::PL2SQL and RDB SQL.
DESCRIPTION
In a traditional RDB, the data structure definition is external, separate from the content data. In NoSQL::PL2SQL::Simple, the two exist side-by-side. Both data content and definition have OO representations.
A data definition can be associated with any arbitrary perl class that subclasses NoSQL::PL2SQL::Simple. In many cases, the subclass is no more than a name that identifies the definition. The data definition is encapsulated in the class's instantiation or instance. (I'm using fairly precise nomenclature.) Many of NoSQL::PL2SQL::Simple's methods are called via the instance (or container).
Data content is encapsulated in class objects. Methods to access or modify data content are called via these objects. Both objects and the single instance are blessed into the same class name. Use appropriate variable names to distinguish each in your code.
Constructor
new()
According to the wisdom of OO pattern science, the constructor is used to create the instance. And a factory (generally applied to the instance) is used to create the objects.
db()
new()
no arguments and returns an instance of the data definition class. As is typical in perl, the method NoSQL::PL2SQL::Simple::new()
is the conventional constructor. However, NoSQL::PL2SQL::Simple is invoked by creating a subclass, which may need its own constructor. Consequently, the following two statements are equivalent:
$instance = NoSQL::PL2SQL::Simple->new ;
$instance = NoSQL::PL2SQL::Simple->db ;
The instance has several functions:
- As a data definition, with methods to alter the definition
- As a factory, with methods to create objects
- As a data source, with methods to query the data
All these methods are detailed in the following sections.
Data Definition Methods
Tabular data is tabular because each element (a data object) has a common structure. The entire data set can be laid on a grid with identifiable, pre-defined column names. Data elements are laid out as rows which can be easily added or deleted.
I use the term NVP set (an associative array in perl) to generalize these elements, and the term tabular requires that each NVP set use the same names. It's helpful if the reader can visualize this more abstract model, because NoSQL::PL2SQL::Simple allows much more flexibility (or variation) among each NVP, so the result can be much less tabular than data stored in a traditional RDB.
The difference is that in NoSQL::PL2SQL::Simple, only some names (or columns or fields) need to be commonly defined within each element (or object). These names are determined by the data definition which are properties of the instance described above.
As an example, consider an application that needs to save each user's session state. If the application is complex, with numerous interfaces, this data is going to be quite unstructured as the state definition gets more complicated. Nevertheless, there are a handful of common elements, say: SessionID, UserID, Email, and Password. Theoretically, this could be done within a strict tabular structure by marshalling the fuzzy stuff into a single BLOB value. (Actually, this approach is not uncommon.)
## A simple application for saving a complex session
BEGIN {
package TQIS::Session ;
use base qw( NoSQL::PL2SQL::Simple ) ;
}
$instance = TQIS::Session->new ;
The data definition is itself an NVP set data object. This is perl, so it's accessed as a hash reference.
## display the data definition
print join "\n", %$instance, '' ;
hash reference, associative array, or NVP set are interchangable terms. Each name (or key) in this set is the same name required in each data object (or element). Each associated value is a data type. The data types are intrinsic to NoSQ::PL2SQL::Simple, three are currently defined. There's a little magic under the hood, so the best way to add data definitions are the following three methods:
addTextIndex()
addNumberIndex()
addDateIndex()
Here's how it's done in our example:
$instance->addNumberIndex( qw( UserID ) ) ;
$instance->addTextIndex( qw( Email Password ) ) ;
In this example SessionID will be an internal, automatically generated key. Since these definitions do not specify uniqueness, the code to enforce a unique UserID is shown later in "Unique Keys".
Factory Methods
Generally, an instance needs a data definition before it's available for factory methods.
record()
As described above, the constructor creates an instance that represents the data definition. Data objects are created using a factory method applied to the instance. record()
is that factory method. Because of this special significance, it is heavily overloaded.
$session = { ... } ; ## A tabular data object
$sessionid = 231 ; ## An assigned id I made up
$object = $instance->record( $session ) ; ## Returns an object copy
$object = $instance->record( $sessionid ) ; ## Returns the stored object
$object = $instance->record(
$sessionid => $session ) ; ## overwrites a stored object
The same record()
factory method is used to read, write, or overwrite a data object, depending on the arguments. Naturally, developers can create conventional read()
and write()
methods in a subclass.
As a factory, record()
is always called via the instance.
Query Methods
Earlier, I compared NoSQL::PL2SQL::Simple to a solution that marshalls the non-tabular data into a single BLOB value. NoSQL::PL2SQL::Simple does not perform any marshalling, so the resulting data storage is more accessible and portable. But it should be obvious that the data marshalled into the BLOB is not available for querying or searching. And this limitation also applies to NoSQL::PL2SQL::Simple.
Since the query operations are tightly bound to the data definition, it follows that the query methods are called on the instance.
query()
Reading and writing data objects does not require a very complicated API. (NoSQL::PL2SQL has practicallyj none). The power and complexity of NoSQL::PL2SQL::Simple lies in its query capabilities. So this section will be more detailed. Most of the API consists of a single method, query()
. Naturally, this method is overloaded, so several variations are described.
AUTOLOAD()
Additionally, query()
is aliased by AUTOLOAD()
. AUTOLOAD is not necessarily a universally loved perl feature, but it can improve code readability.
recordID() ;
Since NoSQL::PL2SQL::Simple doesn't inherently support unique keys, all query methods return an array. recordID()
is available when you absolutely positively need a single scalar.
@match = $instance->query( Email => 'jim@tqis.com' ) ;
@match = $instance->Email('jim@tqis.com') ; ## AUTOLOAD equivalent
warn "entry not found" unless @match ;
warn "duplicate entries found" if @match > 1 ;
$sessionID = $match[0] ;
## single scalar requirement
$sessionID ||= $instance->query( Email => 'jim@tqis.com' )->recordID ;
$session = $instance->record( $sessionID ) ;
This example demonstrates several concepts: First, the definition name can be used as though it were a method definition, thus omitting the first argument. Second, $sessionID
is an automatically generated unique key that is required to use the record()
factory method. NoSQL::PL2SQL::Simple includes an idiom that is a little cleaner.
@session = $instance->query( Email => 'jim@tqis.com' )->records ;
@session = $instance->Email('jim@tqis.com')->records ; ## AUTOLOAD
warn "entry not found" unless @session ;
warn "duplicate entries found" if @session > 1 ;
$session = $sessions[0] ;
$session = $instance->query( Email =>'jim@tqis.com' )->record ;
$session = $instance->Email('jim@tqis.com')->record ; ## AUTOLOAD
query() can support more than one qualifier. This use has no AUTOLOAD equivalent.
@session = $instance->query(
Email => 'jim@tqis.com,
Password => 'in80gres' )->records ;
warn "invalid login" unless @session ;
warn "contact system adminstrator" if @session > 1 ; ## uh-oh
## query()'s "and" logic is built in.
## Roll your own "or" logic as follows:
@results = $instance->query( Email => 'jim@tqis.com' ) ;
push @results, $instance->query( Password => 'in80gres' ) ;
%results = map { $_ => 1 } @results ; ## filter duplicates
@results = keys %results ;
If query()
is called with no arguments, the entire data set is returned. This invocation is typically used to rebuild after changing the data definition.
@keys = $instance->query ;
@everything = $instance->query->records ; ## memory intensive
When passed with a single argument, query()
behaves similarly, except each element's key is accompanied by its associated NVP value.
%email = $instance->query('Email') ;
%email = $instance->Email ; ## AUTOLOAD equivalent
print "select your email address from below\n" ;
printf "%d\t%s\n", @ea while @ea = each %email ;
## an even more ludicrous example:
%passwords = $instance->query('Password') ;
@email = $instance->query( Email => CGI->new->Vars->{email} ) ;
print "select your password from below\n" ;
map { printf "%s\n", $passwords{$_} } @email ;
Many of these extended query options are designed to access data with minimal time and resources.
Object Methods
Perl objects data can usually do not require accessor methods. For an object consisting of NVP's, data is accessed as follows:
$value = $object->{name} ;
Developers are expected to subclass NoSQL::PL2SQL::Simple, and so may elect to write their own accessors.
NoSQL::PL2SQL::Simple stores and returns an object identical to what was originally saved. But there are hidden properties (taking advantage of perl's TIE feature) that require accessors, for example SQLObjectID
.
SQLObjectID()
The SQLObjectID()
method is inherited from NoSQL::PL2SQL, and returns the objects unique internal key:
$sessionID = $object->SQLObjectID ;
save()
delete()
These methods are aliases for instance methods:
$instance->record( $object->SQLObjectID, $object ) ; ## not very elegant
$object->save ; ## terse alternative
$instance->delete( $object->SQLObjectID ) ;
$object->delete ; ## equivalent
NoSQL::PL2SQL::Simple has many shortcuts, but be careful
$o = bless {}, ref $instance ;
$o->save ; ## This won't work!
## This would work
defined $o->SQLObjectID? $o->save: $instance->save( $o ) ;
## But obviously simpler
$instance->save( $o ) ;
perl's bless feature adds lots of magical capabilities to a reference, and its TIE feature adds even more. As shown, the SQLObjectID()
method returns undefined for untied objects. But the recommended approach is to avoid explicitly calling bless()
at all. NoSQL::PL2SQL always blesses tied objects, which avoids the possibility of blessed untied objects.
reindex()
As described, one use of the query()
method is to reindex all the records, or synchronizing the index table to reflect the data table. This process is necessary whenever something is added to the data definition. reindex()
does not take multiple arguments.
## This brute force solution wastes many resources on
## pointless reading and writing
map { $instance->record( $_ )->save } $instance->query ;
## This alternative modifies the specific index entry for
## each record.
## This operation must be repeated for each new data definition
map { $instance->record( $_ )->reindex('contactemail') } $instance->query ;
keyValues()
Under the hood, NoSQL::PL2SQL::Simple is primarily NoSQL::PL2SQL with an indexing subsystem included. Most of the indexing is transparent. However, The keyValues()
method is used to manipulate the index directly. This method needs to be used to maintain many-to-many data relationships.
Naturally, keyValues()
is overloaded, but its use is strightforward. The method requires at least one argument, the name (or column or key).
## The session object has been redefined to include groups.
## A user may be in several groups.
## Start by modifying the data definition
$instance->addNumberIndex( qw( GroupID ) ) ;
## Hypothetically extract a list of groups. Google App data is
## similarly structured.
$object = $instance->record( $sessionID ) ;
@groups = map { $_->{id} } @{ $object->{Groups} } ;
$object->keyValues( GroupID => @groups ) ;
print "List of Groups:\n" ;
print join "\n", $object->keyValues('GroupID'), '' ;
In this particular example, state data will be constantly updated. Unfortunately, keyValues()
always needs to be explicitly called. So all of the following code is now required to save session data:
$sessiondata = ... ; ## unblessed, untied raw data
$object = $instance->save( $sessionID, $sessiondata ) ;
@groups = map { $_->{id} } @{ $object->{Groups} } ;
$object->keyValues('GroupID')->clear ;
$object->keyValues( GroupID => @groups ) ;
Relationships defined by the keyValues()
method are intended to be persistent. See the discussion under CAVEATS.
clear()
The clear()
method is indirectly applied to a NoSQL::PL2SQL::Simple object as follows:
@groups = $object->keyValues('GroupID') ;
$object->keyValues('GroupID')->clear ; ## Deletes all keys
$object->keyValues('GroupID')->clear( $groups[0] ) ; ## Selective
DATA SOURCE CLASSES
NoSQL::PL2SQL uses a single table to store arbitrarily structured data. There is no need to create different tables for different types of objects. Although NoSQL::PL2SQL::Simple requires a pair of tables, the data structure definition is independent of these tables, so one pair of tables can be used for numerous implementations. In fact, a completely normalized database can be built without using separate tables.
For simplicity, the previous examples had no code that defines the data source. To keep things simple, subclass NoSQL::PL2SQL::Simple in a separate Data Source Class and define the data source there.
package TQIS::PL2SQL::MyData ; ## An arbitrary class name
use base qw( NoSQL::PL2SQL::Simple ) ; ## Do not change this line
use NoSQL::PL2SQL::DBI::SQLite ; ## Use one of the available
## drivers.
my @dsn = () ; ## Do not change this line
## data source subclasses override this dsn() method
sub dsn {
return @dsn if @dsn ; ## Do not change this line
my %tables ;
$tables{objectdata} = 'aTableName' ; ## Personal preference
$tables{querydata} = 'anotherTableName' ; ## Ditto
push @dsn, new NoSQL::PL2SQL::DBI::SQLite $tables{objectdata} ;
$dsn[0]->connect( 'dbi:SQLite:dbname=:memory:', '', '') ;
push @dsn, $dsn[0]->table( $tables{querydata} ) ;
return @dsn ; ## Do not change this line
}
## Each of the following classes can have independent data structure
## definitions. After data definition, the classes below can be used
## without additional code.
package MyArbitraryClass ;
use base qw( TQIS::PL2SQL::MyData ) ;
package TQIS::HighSchools ;
use base qw( TQIS::PL2SQL::MyData ) ;
package TQIS::HighSchoolFriends ;
use base qw( TQIS::PL2SQL::MyData ) ;
package TQIS::GPRC::Members ;
use base qw( TQIS::PL2SQL::MyData ) ;
This sample code can be written into a single file and installed in perl's class path. However, before proceeding, make sure to run the following installation code:
use TQIS::PL2SQL::MyData ;
TQIS::PL2SQL::MyData->loadschema ;
OTHER FEATURES
Index Mapping
In its earliest incarnations, this module was used to store form data submitted from website forms. At one time, these forms were created by Adobe DreamWeaver, where the form fields were renamed everytime the form was updated. The resulting hack remains, and is described here as Index Mapping.
A more hypothetical situation is that the data is updated by users who submit a spreadsheet. Upon submission, each spreadsheet row is added as a new record, using the definition in the column headers. After some successful period of production, suddenly one of the column names is changed from Email to ContactEmail.
## First solution: Change the code
## Find all occurences of "save()" or "record()" equivalents, and
## change as follows:
$instance->record( CGI->new->Vars, ContactEmail => 'Email' ) ;
$instance->save( CGI->new->Vars, ContactEmail => 'Email' ) ;
## Alternate solution: Change the data definition
$instance->addTextIndex( qw( ContactEmail ) ) ;
map { $instance->record( $_ )->reindex( Email => ContactEmail )
$instance->query ;
The second solution is probably more maintainable, except that the data definition has become cluttered. The ultimate fix requires some understanding of the NoSQL::PL2SQL innards, but it looks like this:
my $datadef = NoSQL::PL2SQL::SQLObject(
ref( $instance ),
$instance->dsn->[0], 0 ) ;
delete $datadef->{Email} ;
undef $datadef ;
Index Mapping is also useful for a similar problem. EG., in a contact manager, the input data may specify a Work Email, Home Email, and Other Email. When a user is queried by email, it shouldn't matter where that address was originally entered. The answer is similar to the First Solution in the previous example.
$instance->addTextIndex( qw( Email ) ) ; ## Ideally during installation
$instance->save( CGI->new->Vars,
'Work Email' => 'Email',
'Home Email' => 'Email',
'Other Email' => 'Email'
) ;
Archiving Records
NoSQL::PL2SQL has a feature called incremental updates: Whenever an object is modified, only the modifications are written to the data source. This feature is not included in NoSQL::PL2SQL::Simple. When an object is updated, the old data is deleted and the replacement is written as new data. From a performance standpoint, it's slightly faster simply to write out new data, at the expense of increased data storage requirements. But if data storage isn't an issue, NoSQL::PL2SQL::Simple can act like a time machine archiving each version of stored data.
This feature can be easily turned on using the following data definition:
$instance->addNumberIndex( qw( archive ) ) ; ## on installation
@history = $instance->archive( $sessionID )->records ; ## at runtime
Ultimately, the archive feature isn't quite so efficient because additional write operations are performed to insure that the $sessionID
value remains constant.
IMPLEMENTATION EXAMPLES
Most of this section is still under construction.
Unique Keys
The example at the beginning of this document discussed an application that uses NoSQL::PL2SQL::Simple to save state data. This application relies on an externally supplied UserID that must be unique. This code ensures that uniqueness.
$instance = TQIS::Session->new ;
$sessiondata = ... ; ## unblessed, untied raw data
@args = $sessionID? ( $sessionID ):
$instance->UserID( $sessiondata->{UserID} ) ;
$instance->save( @args, $sessiondata ) ;
CAVEATS
keyValues()
When a record is saved, its indexes are all rebuilt. This implementation deletes all the index records associated with the record, and then automatically inserts new records to reflect replacement values. So the manual keyValues()
operation must be manually repeated upon every save, as shown in the code below.
$sessiondata = ... ; ## unblessed, untied raw data
$object = $instance->save( $sessionID, $sessiondata ) ;
@groups = map { $_->{id} } @{ $object->{Groups} } ;
$object->keyValues('GroupID')->clear ; ## currently unnecessary
$object->keyValues( GroupID => @groups ) ;
This example works fine, because the elements in @groups
are accessible for this manual operation. In practical terms, if users select their own groups, the group relationships should always be defined within $sessiondata
. However, if there are other operations that define the group-user relationships, that data needs to be more persistent.
The keyValues()
method is extended so that the following workaround is available:
$sessiondata = ... ; ## unblessed, untied raw data
@groups = $instance->record( $sessionID )->keyValues('GroupID') ;
## potential race condition occurs here
$object = $instance->save( $sessionID, $sessiondata ) ;
$object->keyValues( GroupID => @groups ) ;
The problem with this approach is a potential race condition if other users insert new group relationships during this operation.
Ultimately, the solution is to flag index records that are manually defined to guarantee their persistence. This feature is not particularly complicated, but requires a change to the underlying data structure. Because this effort involves NoSQL::PL2SQL::DBI and worrying about backward compatibility, it is planned for the next major release.
EXPORT
None by default.
SEE ALSO
AUTHOR
Jim Schueler, <jim@tqis.com>
COPYRIGHT AND LICENSE
Copyright (C) 2012 Jim Schueler
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.9 or, at your option, any later version of Perl 5 you may have available.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 907:
Unterminated C<...> sequence