NAME

Bio::MAGETAB::Util::Builder - A storage class used to track Bio::MAGETAB object creation.

SYNOPSIS

use Bio::MAGETAB::Util::Builder;
my $builder = Bio::MAGETAB::Util::Builder->new({
   relaxed_parser => $is_relaxed,
});

DESCRIPTION

Creation of complex Bio::MAGETAB object heirarchies and DAGs requires a mechanism to track the instantiated objects, and manage any updates. This class (and its subclasses) provides that mechanism. Builder objects are created and included in Reader object instantiation, such that the back-end storage engine populated by a given Reader object may be redefined as desired. This base Builder class simply tracks objects in a hash of hashes; this is sufficient for simple parsing of MAGE-TAB documents. See the DBLoader class for an example of a Builder subclass that can be used to populate a Tangram-based relational database schema.

ATTRIBUTES

relaxed_parser

A boolean value (default FALSE) indicating whether or not the parse should take place in "relaxed mode" or not. The regular parsing mode will throw an exception in cases where an object is referenced before it has been declared (e.g., Protocol REF pointing to a non-existent Protocol Name). Relaxed parsing mode will silently autogenerate the non-existent objects instead.

magetab

An optional Bio::MAGETAB container object. If none is passed upon Builder object instantiation, a new Bio::MAGETAB object is created for you. See the Bio::MAGETAB class for details.

authority

An optional authority string to be used in object creation.

namespace

An optional namespace string to be used in object creation.

database

The internal store to use for object lookups. In the base Builder class this is a simple hash reference, and it is unlikely that you will ever want to change the default. This attribute is used in persistence subclasses (such as DBLoader) to point at the underlying storage engine.

METHODS

Each of the Bio::MAGETAB classes can be handled by get_*, create_* and find_or_create_* methods.

get_*

Retrieve the desired object from the database. Takes a hash reference of attribute values and returns the desired object. This method raises an exception if the passed-in attributes do not match any object in the database. See "OBJECT IDENTITY", below, for information on how objects are matched in the database.

create_*

Creates a new object using the passed attribute hash reference and stores it in the database.

find_or_create_*

Attempts to find the desired object in the same way as the get_* methods, and upon failure creates a new object and stores it.

The following mapping should be used to determine the name of the desired method:

Bio::MAGETAB class                  Method base name
------------------                  ----------------

Bio::MAGETAB::ArrayDesign           array_design
Bio::MAGETAB::Assay                 assay
Bio::MAGETAB::Comment               comment
Bio::MAGETAB::CompositeElement      composite_element
Bio::MAGETAB::Contact               contact
Bio::MAGETAB::ControlledTerm        controlled_term
Bio::MAGETAB::DataAcquisition       data_acquisition
Bio::MAGETAB::DatabaseEntry         database_entry
Bio::MAGETAB::DataFile              data_file
Bio::MAGETAB::DataMatrix            data_matrix
Bio::MAGETAB::Edge                  edge
Bio::MAGETAB::Extract               extract
Bio::MAGETAB::Factor                factor
Bio::MAGETAB::FactorValue           factor_value
Bio::MAGETAB::Feature               feature
Bio::MAGETAB::Investigation         investigation
Bio::MAGETAB::LabeledExtract        labeled_extract
Bio::MAGETAB::MatrixColumn          matrix_column
Bio::MAGETAB::MatrixRow             matrix_row
Bio::MAGETAB::Measurement           measurement
Bio::MAGETAB::Normalization         normalization
Bio::MAGETAB::ParameterValue        parameter_value
Bio::MAGETAB::Protocol              protocol
Bio::MAGETAB::ProtocolApplication   protocol_application
Bio::MAGETAB::ProtocolParameter     protocol_parameter
Bio::MAGETAB::Publication           publication
Bio::MAGETAB::Reporter              reporter
Bio::MAGETAB::SDRF                  sdrf
Bio::MAGETAB::SDRFRow               sdrf_row
Bio::MAGETAB::Sample                sample
Bio::MAGETAB::Source                source
Bio::MAGETAB::TermSource            term_source

Example: a Bio::MAGETAB::DataFile object can be created using the create_data_file method.

In addition to the above, the following method is included to help manage objects stored relational database backends (see the DBLoader subclass):

update

Passed a list of Bio::MAGETAB objects, this method will attempt to update those objects in any persistent storage engine. This method doesn't have any effect in the base Builder class, but it is very important to the DBLoader subclass. See CAVEATS in the DBLoader class.

OBJECT IDENTITY

For most Bio::MAGETAB classes, identity between objects is fairly easily defined. For example, all Material objects have a name attribute which identifies it within a given namespace:authority grouping. However, many classes do not have this simple mechanism. For example, Edge objects have no attributes other than their input and output nodes, and a list of protocol applications. To address this, the Builder module includes a set of identity heuristics defined for each class; in this example, Edge will be identified by examining its input and output nodes. Namespace and authority terms are used to localize objects.

In theory this should all just work. However, the system is complex and so undoubtedly there will be times when this module behaves other than you might expect. Therefore, the current set of heuristics is listed below for your debugging delight:

Bio::MAGETAB class                Identity depends on:
------------------                -------------------
Bio::MAGETAB::ArrayDesign         name accession termSource
Bio::MAGETAB::Assay               name
Bio::MAGETAB::Comment             name value object*
Bio::MAGETAB::CompositeElement    name
Bio::MAGETAB::Contact             firstName midInitials lastName
Bio::MAGETAB::ControlledTerm      category value termSource accession
Bio::MAGETAB::DataAcquisition     name
Bio::MAGETAB::DatabaseEntry       accession termSource
Bio::MAGETAB::DataFile            uri
Bio::MAGETAB::DataMatrix          uri
Bio::MAGETAB::Edge                inputNode outputNode
Bio::MAGETAB::Extract             name
Bio::MAGETAB::Factor              name
Bio::MAGETAB::FactorValue         factor term measurement
Bio::MAGETAB::Feature             blockCol blockRow col row array_design*
Bio::MAGETAB::Investigation       title
Bio::MAGETAB::LabeledExtract      name
Bio::MAGETAB::MatrixColumn        columnNumber data_matrix*
Bio::MAGETAB::MatrixRow           rowNumber data_matrix*
Bio::MAGETAB::Measurement         measurementType value minValue maxValue unit object*
Bio::MAGETAB::Normalization       name
Bio::MAGETAB::ParameterValue      parameter protocol_application*
Bio::MAGETAB::Protocol            name accession termSource
Bio::MAGETAB::ProtocolApplication protocol edge*
Bio::MAGETAB::ProtocolParameter   name protocol
Bio::MAGETAB::Publication         title
Bio::MAGETAB::Reporter            name
Bio::MAGETAB::SDRF                uri
Bio::MAGETAB::SDRFRow             rowNumber sdrf*
Bio::MAGETAB::Sample              name
Bio::MAGETAB::Source              name
Bio::MAGETAB::TermSource          name

Not all the slots are needed for an object to be identified; for example, a Contact object might only have a lastName. Asterisked (*) terms are those which do not correspond to any attribute of the Bio::MAGETAB class. These are typically "container" objects, i.e. those involved in aggregating the target objects. For example, the identity of a given Comment object is tied up with the "object" to which it has been applied. These objects are passed in as part of the object instantiation hash reference, and are discarded prior to object creation. NOTE: These aggregating objects are not processed in any way by Builder; you will need to ensure the objects are correctly linked together yourself.

KNOWN BUGS

The identity of Bio::MAGE::ProtocolApplication objects is based solely around the Protocol being applied, and the Edge to which it is attached. Ideally, the protocol application date would also be included, but this can create problems for persistence-based Builder subclasses where the exact serialization behavior of DateTime objects needs to be defined (see the DBLoader class). This is a tractable problem, but a fix has been omitted from this release since the use case (the same Protocol applied to a single Edge multiple times on different dates) seems a minor one. The workaround is to split the protocol applications into as many Edges as are needed.

SEE ALSO

Bio::MAGETAB Bio::MAGETAB::Util::Reader Bio::MAGETAB::Util::DBLoader

AUTHOR

Tim F. Rayner <tfrayner@gmail.com>

LICENSE

This library is released under version 2 of the GNU General Public License (GPL).