NAME
Array::To::Moose - Build Moose objects from a data array
VERSION
This document describes Array::To::Moose version 0.0.1
SYNOPSIS
use Array::To::Moose;
# or
use Array::To::Moose qw(array_to_moose set_class_ind set_key_ind
throw_nonunique_keys throw_multiple_rows );
Array::To::Moose
exports function array_to_moose()
by default, and convenience functions set_class_ind()
, set_key_ind()
, throw_nonunique_keys()
and throw_multiple_rows()
if requested.
array_to_moose
array_to_moose()
builds Moose objects from suitably-sorted 2-dimensional arrays of data of the type returned by, e.g., DBI::selectall_arrayref(), i.e. a reference to an array containing references to an array for each row of data fetched.
Example 1
package Car;
use Moose;
has 'make' => (is => 'ro', isa => 'Str');
has 'model' => (is => 'ro', isa => 'Str');
has 'year' => (is => 'ro', isa => 'Int');
package CarOwner;
use Moose;
has 'last' => (is => 'ro', isa => 'Str');
has 'first' => (is => 'ro', isa => 'Str');
has 'Cars' => (is => 'ro', isa => ArrayRef[Car]');
...
# in package main:
use Array::To::Moose;
# In this dataset Alex owns two cars, Jim one, and Alice three
my $data = [
[ qw( Green Alex Ford Focus 2011 ) ],
[ qw( Green Alex VW Jetta 2009 ) ],
[ qw( Green Jim Honda Civic 2007 ) ],
[ qw( Smith Alice Buick Regal 2012 ) ],
[ qw( Smith Alice Toyota Camry 2008 ) ],
[ qw( Smith Alice BMW X5 2010 ) ],
];
my $CarOwners = array_to_moose(
data => $data,
desc => {
class => 'CarOwner',
last => 0,
first => 1,
Cars => {
class => 'Car',
make => 2,
model => 3,
year => 4,
} # Cars
} # Car Owners
);
print $CarOwners->[2]->Cars->[1]->model; # prints "Camry"
In the above example, array_to_moose()
returns a reference to an array of CarOwner
objects, $CarOwners
.
If a hash of CarOwner
objects is required, a "key =>
... " entry must be added to the descriptor hash. For example, to construct a hash of CarOwner
objects, whose key is the owner's first name, (unique for every person in the example data), the call becomes:
my $CarOwnersH = array_to_moose(
data => $data,
desc => {
class => 'CarOwner',
key => 1, # note key
last => 0,
first => 1,
Cars => {
class => 'Car',
make => 2,
model => 3,
year => 4,
} # Cars
} # Car Owners
);
print $CarOwnersH->{Alex}->Cars->[0]->make; # prints "Ford"
Similarly, to construct the Cars
sub-objects as hash sub-objects (and not an array as above), define CarOwner
as:
package CarOwner;
use Moose;
has 'last' => (is => 'ro', isa => 'Str' );
has 'first' => (is => 'ro', isa => 'Str' );
has 'Cars' => (is => 'ro', isa => 'HashRef[Car]'); # Was 'ArrayRef[Car]'
and noting that the car make
is unique within the $data
dataset, we could construct the reference to an array of objects with the call:
$CarOwners = array_to_moose(
data => $data,
desc => {
class => 'CarOwner',
last => 0,
first => 1,
Cars => {
class => 'Car',
key => 2, # note key
model => 3,
year => 4,
} # Cars
} # Car Owners
);
print $CarOwners->[2]->Cars->{BMW}->model; # prints 'X5'
Example 2 - Use with DBI
The main rationale for writing Array::To::Moose
is to make it easy to build Moose objects from data extracted from relational databases, especially when the database query involves multiple tables with one-to-many relationships to each other.
As an example, consider a database which models patients making visits to a clinic on multiple occasions, and on each visit, having a doctor run some tests and diagnose the patient's complaint. In this model, the database Patient table would have a one-to-many relationship with the Visit table, which in turn would have a one-to-many relationship with the Test table
The corresponding Moose model has nested Moose objects which reflects those one-to-many relationships, i.e., multiple Visit objects per Patient object and multiple Test objects per Visit object, declared as:
package Test;
use Moose;
has 'name' => (is => 'rw', isa => 'Str');
has 'result' => (is => 'rw', isa => 'Str');
package Visit;
use Moose;
has 'date' => (is => 'rw', isa => 'Str' );
has 'md' => (is => 'rw', isa => 'Str' );
has 'diagnosis' => (is => 'rw', isa => 'Str' );
has 'Tests' => (is => 'rw', isa => 'HashRef[Test]' );
package Patient;
use Moose;
has 'last' => (is => 'rw', isa => 'Str' );
has 'first' => (is => 'rw', isa => 'Str' );
has 'Visits' => (is => 'rw', isa => 'ArrayRef[Visit]' );
In the main program:
use DBI;
use Array::To::Moose;
...
my $sql = q{
SELECT
P.Last, P.First
,V.Date, V.Doctor, V.Diagnosis
,T.Name, T.Result
FROM
Patient P
,Visit V
,Test T
WHERE
-- join clauses
P.Patient_key = V.Patient_key
AND V.Visit_key = T.Visit_key
...
ORDER BY
P.Last, P.First, V.Date
};
my $dbh = DBI->connect(...);
my $data = $dbh->selectall_arrayref($sql);
# rows of @$data contain:
# Last, First, Date, Doctor, Diagnosis, Name, Result
# at positions: [0] [1] [2] [3] [4] [5] [6]
my $patients = array_to_moose(
data => $data,
desc => {
class => 'Patient',
last => 0,
first => 1,
Visits => {
class => 'Visit',
date => 2,
md => 3,
diagnosis => 4,
Tests => {
class => 'Test',
key => 5,
name => 5,
result => 6,
} # tests
} # visits
} # patients
);
print $patients->[2]->Visits->[0]->Tests->{BP}->result; # prints '120/80'
Note: We used the Test name
as the key for the Visis 'Tests
', as the tests have unique names within any one Visit. (See t/5.t)
DESCRIPTION
As shown in the above examples, the general usage is:
package MyClass;
use Moose;
...
use Array::To::Moose;
...
my $data_ref = fetchall_arrayref($sql); # for example
my $object_ref = array_to_moose(
data => $data_ref
desc => {
class => 'MyClass',
key => <key_col>, # only for HashRefs
attrib_1 => <column_number_1>,
attrib_2 => <column_number_2>,
...
SubObject => {
class => 'MySubClass',
...
}
}
);
Where:
$object_ref
will contain a reference to an array or hash of MyClass
Moose objects. All Moose classes (MyClass
, MySubClass
, etc) must already have been defined.
$data_ref
is a reference to an array containing references to arrays of scalars (AoA) of the kind returned by DBI::selectall_arrayref()
desc
(descriptor) is a reference to a hash which contains several types of data:
class =>
... is required and defines the Moose class or package which will contain the data. The user should have defined this class already.
key =>
... is required if the Moose object being constructed is to be a hashref, either as the top-level Moose object returned from array_to_moose()
or as a "isa => 'HashRef[...]'
" sub-object.
attrib => N
where attrib
is the name of a Moose attribute ("has 'attrib' =>
..."), N
is a positive integer containing the the corresponding zero-indexed column number in the data array where that attribute's data is to be found.
Sub-Objects
array_to_moose()
can handle three types of Moose sub-objects, i.e.:
an array of sub-objects:
has => Sub_Obj ( isa => 'ArrayRef[MyObj]' );
a hash of sub-objects:
has => Sub_Obj ( isa => 'HashRef[MyObj]' );
or a single sub-object:
has => Sub_Obj ( isa => 'MyObj' );
the descriptor entry for Sub_Obj
in each of these cases is (almost) the same:
desc => {
class => ...
...
Sub_Obj => {
class => 'MyObj',
key => <keycol> # HashRef['] only
attrib_a => <N>,
...
} # end SubObj
...
} # end desc
(A HashRef[']
sub-object will also require a key => N
entry in the descriptor).
Ordering the data
array_to_moose()
does not sort the input data array, and does all processing in a single pass through the data. This means that the data in the array must be sorted properly for the algorithm to work.
For example, in the previous Patient/Visit/Test example, in which there are many Tests per Visit and many Visits per Patient, the data in the Test column(s) must change the fastest, the Visit data slower, and the Patient data the slowest:
Patient Visit Test
------ ----- ----
P1 V1 T1
P1 V1 T2
P1 V1 T3
P1 V2 T4
P1 V2 T5
P2 V3 T6
P2 V3 T7
P2 V4 T8
In SQL this would be accomplished by a SORT BY
clause, e.g.:
SORT BY Patient.Key, Visit.Key, Test.Key
throw_nonunique_keys ()
By default, array_to_moose()
does not check the uniqueness of hash key values within the data. If the key values in the data are not unique, existing hash entries will get overwritten, and the sub-object will contain the value from the last data row which contained that key value. For example:
package Employer;
use Moose;
has 'year' => (is => 'rw', isa => 'Str');
has 'name' => (is => 'rw', isa => 'Str');
package Person;
use Moose;
has 'name' => (is => 'rw', isa => 'Str' );
has 'Employers' => (is => 'rw', isa => 'HashRef[Employer]');
...
my $data = [
[ 'Anne Miller', '2005', 'Acme Corp' ],
[ 'Anne Miller', '2006', 'Acme Corp' ],
[ 'Anne Miller', '2007', 'Widgets, Inc' ],
...
];
The call:
my $obj = array_to_moose(
data => $data,
desc => {
class => 'Person',
name => 0,
Employers => {
class => 'Employer',
key => 2, # using employer name as key
year => 1,
} # Employer
} # Person
);
Because the employer was 'Acme Corp'
in years 2005 & 2006, array_to_moose
will silently overwrite the 2005 Employer object with the data for the 2006 Employer object:
print $object->[0]->Employers->{'Acme Corp'}->year, "\n"; # prints '2006'
Calling throw_uniq_keys()
(either with no argument, or with a non-zero argument) enables reporting of non-unique keys. In the above example, array_to_moose()
would exit with warning:
Non-unique key 'Acme Corp' in 'Employer' class ...
Calling throw_uniq_keys(0)
, i.e. with an argument of zero will disable subsequent reporting of non-unique keys. (See t/8c.t)
throw_multiple_rows ()
For single-occurence sub-objects (i.e. ( isa => 'MyObj' )
), if the data contains more than one row of data for the sub-object, only the first row will be used to construct the single sub-object and array_to_moose()
will not report the fact. E.g.:
package Salary;
use Moose;
has 'year' => (is => 'rw', isa => 'Str');
has 'amount' => (is => 'rw', isa => 'Int');
package Person;
use Moose;
has 'name' => (is => 'rw', isa => 'Str' );
has 'Salary' => (is => 'rw', isa => 'Salary'); # a single object
...
my $data = [
[ 'John Smith', '2005', 23_350 ],
[ 'John Smith', '2006', 24_000 ],
[ 'John Smith', '2007', 26_830 ],
...
];
The call:
my $obj = array_to_moose(
data => $data,
desc => {
class => 'Person'
name => 0,
Salary => {
class => 'Salary',
year => 1,
amount => 2
} # Salary
} # Person
);
would silently assign to Salary
, the first row of the three Salary data rows, i.e. for year 2005:
print $object->[0]->Salary->year, "\n"; # prints '2005'
Calling throw_multiple_rows()
(either with no argument, or with a non-zero argument) enables reporting of this situation. In the above example, array_to_moose()
will exit with error:
Expected a single 'Salary' object, but got 3 of them ...
Calling throw_multiple_rows(0)
, i.e. with an argument of zero will disable subsequent reporting of this error. (See t/8d.t)
set_class_ind (), set_key_ind ()
Problems arise if the Moose objects being constructed contain attributes called class or key, causing ambiguities in the descriptor. (Does key => 5
mean the attribute key
or the hash key key
is in the 5th column?)
In these cases, set_class_ind()
and set_key_ind()
can be used to change the keywords for class => ...
and key => ...
descriptor entries.
For example:
package Letter;
use Moose;
has 'address' => ( is => 'ro', isa => 'Str' );
has 'class' => ( is => 'ro', isa => 'PostalClass' );
...
set_key_ind('package'); # use "package =>" in place of "class =>"
my $letters = array_to_moose(
data => $data,
desc => {
package => 'Letter', # the Moose class
address => 0,
class => 1, # the attribute 'class'
...
}
);
Read-only Attributes
One of the recommendations of Moose::Manual::BestPractices is to make attributes read-only (isa => 'ro'
) wherever possible. Array::To::Moose
supports this by evaluating all the attributes for a given object given in the descriptor, then including them all in the call to new(...)
when constructing the object.
For Moose objects with attributes which are sub-objects, i.e. references to a Moose object, or references to an array or hash of Moose objects, it means that the sub-objects must be evaluated before the new()
call. The effect of this for multi-leveled Moose objects is that object evaluations are carried out depth-first.
Treatment of NULL
s
array_to_moose()
uses Array::GroupBy::igroup_by
to compare the rows in the data given in data => ...
, using function Array::GroupBy::str_row_equal()
which compares the data as strings.
If the data contains undef
values, typically returned from database SQL queries in which DBI maps NULL values to undef
, when str_row_equal()
encounters undef
elements in corresponding column positions, it will consider the elements equal
. When corresponding column elements are defined and undef
respectively, the elements are considered unequal
.
This truth table demonstrates the various combinations:
-------+------------+--------------+--------------+--------------
row 1 | ('a', 'b') | ('a', undef) | ('a', undef) | ('a', 'b' )
row 2 | ('a', 'b') | ('a', undef) | ('a', 'b' ) | ('a', undef)
-------+------------+--------------+--------------+--------------
equal? | yes | yes | no | no
EXPORT
array_to_moose
by default; and set_class_ind
and set_key_ind
if requested.
DIAGNOSTICS
set_key_ind() argument not defined croak "set_class_ind() argument not defined" croak "No class => ..." Class '$class' not defined Attribute '$name' not in '$class' object
(test if 'class' or 'key' isn't also an attribute) the '$class' object has an attribute called '$name' no data in data => ... found a ref in attribs: (column#) $msg must be a +ve integer (column#) greater than # cols in the data 'array_to_moose(desc => ...)' arg has an odd number of members empty descriptor no '$CLASS' defined in descriptor: can't delete '$CLASS' element of descriptor can't delete '$KEY' element of descriptor attribute '$name' can't be a '$ref' reference no attributes with column numbers in descriptor: Moose attribute '$attr_name' has no type
desc generated a '", ref $sub_obj, "' object and " . "not the expected array. desc = "
Note that its the user's responsibility to make sure that the types of data in the AoA matches the Moose attributes.
DEPENDENCIES
Carp
Params::Validate
Array::GroupBy
SEE ALSO
AUTHOR
Sam Brain <samb@stanford.edu>
COPYRIGHT AND LICENSE
Copyright (c) Stanford University. June 6th, 2010. All rights reserved. Author: Sam Brain <samb@stanford.edu>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.8.8 or, at your option, any later version of Perl 5 you may have available.