NAME
Bio::MAGETAB::Util::Builder - A storage class used to track Bio::MAGETAB object creation.
SYNOPSIS
use Bio::MAGETAB::Util::Builder;
my $builder = Bio::MAGETAB::Util::Builder->new({
relaxed_parser => $is_relaxed,
});
DESCRIPTION
Creation of complex Bio::MAGETAB object heirarchies and DAGs requires a mechanism to track the instantiated objects, and manage any updates. This class (and its subclasses) provides that mechanism. Builder objects are created and included in Reader object instantiation, such that the back-end storage engine populated by a given Reader object may be redefined as desired. This base Builder class simply tracks objects in a hash of hashes; this is sufficient for simple parsing of MAGE-TAB documents. See the DBLoader class for an example of a Builder subclass that can be used to populate a Tangram-based relational database schema.
ATTRIBUTES
- relaxed_parser
-
A boolean value (default FALSE) indicating whether or not the parse should take place in "relaxed mode" or not. The regular parsing mode will throw an exception in cases where an object is referenced before it has been declared (e.g., Protocol REF pointing to a non-existent Protocol Name). Relaxed parsing mode will silently autogenerate the non-existent objects instead.
- magetab
-
An optional Bio::MAGETAB container object. If none is passed upon Builder object instantiation, a new Bio::MAGETAB object is created for you. See the Bio::MAGETAB class for details.
-
An optional authority string to be used in object creation.
- namespace
-
An optional namespace string to be used in object creation.
- database
-
The internal store to use for object lookups. In the base Builder class this is a simple hash reference, and it is unlikely that you will ever want to change the default. This attribute is used in persistence subclasses (such as DBLoader) to point at the underlying storage engine.
METHODS
Each of the Bio::MAGETAB classes can be handled by get_*, create_* and find_or_create_* methods.
- get_*
-
Retrieve the desired object from the database. Takes a hash reference of attribute values and returns the desired object. This method raises an exception if the passed-in attributes do not match any object in the database. See "OBJECT IDENTITY", below, for information on how objects are matched in the database.
- create_*
-
Creates a new object using the passed attribute hash reference and stores it in the database.
- find_or_create_*
-
Attempts to find the desired object in the same way as the get_* methods, and upon failure creates a new object and stores it.
The following mapping should be used to determine the name of the desired method:
Bio::MAGETAB class Method base name
------------------ ----------------
Bio::MAGETAB::ArrayDesign array_design
Bio::MAGETAB::Assay assay
Bio::MAGETAB::Comment comment
Bio::MAGETAB::CompositeElement composite_element
Bio::MAGETAB::Contact contact
Bio::MAGETAB::ControlledTerm controlled_term
Bio::MAGETAB::DataAcquisition data_acquisition
Bio::MAGETAB::DatabaseEntry database_entry
Bio::MAGETAB::DataFile data_file
Bio::MAGETAB::DataMatrix data_matrix
Bio::MAGETAB::Edge edge
Bio::MAGETAB::Extract extract
Bio::MAGETAB::Factor factor
Bio::MAGETAB::FactorValue factor_value
Bio::MAGETAB::Feature feature
Bio::MAGETAB::Investigation investigation
Bio::MAGETAB::LabeledExtract labeled_extract
Bio::MAGETAB::MatrixColumn matrix_column
Bio::MAGETAB::MatrixRow matrix_row
Bio::MAGETAB::Measurement measurement
Bio::MAGETAB::Normalization normalization
Bio::MAGETAB::ParameterValue parameter_value
Bio::MAGETAB::Protocol protocol
Bio::MAGETAB::ProtocolApplication protocol_application
Bio::MAGETAB::ProtocolParameter protocol_parameter
Bio::MAGETAB::Publication publication
Bio::MAGETAB::Reporter reporter
Bio::MAGETAB::SDRF sdrf
Bio::MAGETAB::SDRFRow sdrf_row
Bio::MAGETAB::Sample sample
Bio::MAGETAB::Source source
Bio::MAGETAB::TermSource term_source
Example: a Bio::MAGETAB::DataFile object can be created using the create_data_file
method.
In addition to the above, the following method is included to help manage objects stored relational database backends (see the DBLoader subclass):
- update
-
Passed a list of Bio::MAGETAB objects, this method will attempt to update those objects in any persistent storage engine. This method doesn't have any effect in the base Builder class, but it is very important to the DBLoader subclass. See CAVEATS in the DBLoader class.
OBJECT IDENTITY
For most Bio::MAGETAB classes, identity between objects is fairly easily defined. For example, all Material objects have a name attribute which identifies it within a given namespace:authority grouping. However, many classes do not have this simple mechanism. For example, Edge objects have no attributes other than their input and output nodes, and a list of protocol applications. To address this, the Builder module includes a set of identity heuristics defined for each class; in this example, Edge will be identified by examining its input and output nodes. Namespace and authority terms are used to localize objects.
In theory this should all just work. However, the system is complex and so undoubtedly there will be times when this module behaves other than you might expect. Therefore, the current set of heuristics is listed below for your debugging delight:
Bio::MAGETAB class Identity depends on:
------------------ -------------------
Bio::MAGETAB::ArrayDesign name accession termSource
Bio::MAGETAB::Assay name
Bio::MAGETAB::Comment name value object*
Bio::MAGETAB::CompositeElement name
Bio::MAGETAB::Contact firstName midInitials lastName
Bio::MAGETAB::ControlledTerm category value termSource accession
Bio::MAGETAB::DataAcquisition name
Bio::MAGETAB::DatabaseEntry accession termSource
Bio::MAGETAB::DataFile uri
Bio::MAGETAB::DataMatrix uri
Bio::MAGETAB::Edge inputNode outputNode
Bio::MAGETAB::Extract name
Bio::MAGETAB::Factor name
Bio::MAGETAB::FactorValue factor term measurement
Bio::MAGETAB::Feature blockCol blockRow col row array_design*
Bio::MAGETAB::Investigation title
Bio::MAGETAB::LabeledExtract name
Bio::MAGETAB::MatrixColumn columnNumber data_matrix*
Bio::MAGETAB::MatrixRow rowNumber data_matrix*
Bio::MAGETAB::Measurement measurementType value minValue maxValue unit object*
Bio::MAGETAB::Normalization name
Bio::MAGETAB::ParameterValue parameter protocol_application*
Bio::MAGETAB::Protocol name accession termSource
Bio::MAGETAB::ProtocolApplication protocol edge*
Bio::MAGETAB::ProtocolParameter name protocol
Bio::MAGETAB::Publication title
Bio::MAGETAB::Reporter name
Bio::MAGETAB::SDRF uri
Bio::MAGETAB::SDRFRow rowNumber sdrf*
Bio::MAGETAB::Sample name
Bio::MAGETAB::Source name
Bio::MAGETAB::TermSource name
Not all the slots are needed for an object to be identified; for example, a Contact object might only have a lastName. Asterisked (*) terms are those which do not correspond to any attribute of the Bio::MAGETAB class. These are typically "container" objects, i.e. those involved in aggregating the target objects. For example, the identity of a given Comment object is tied up with the "object" to which it has been applied. These objects are passed in as part of the object instantiation hash reference, and are discarded prior to object creation. NOTE: These aggregating objects are not processed in any way by Builder; you will need to ensure the objects are correctly linked together yourself.
KNOWN BUGS
The identity of Bio::MAGE::ProtocolApplication objects is based solely around the Protocol being applied, and the Edge to which it is attached. Ideally, the protocol application date would also be included, but this can create problems for persistence-based Builder subclasses where the exact serialization behavior of DateTime objects needs to be defined (see the DBLoader class). This is a tractable problem, but a fix has been omitted from this release since the use case (the same Protocol applied to a single Edge multiple times on different dates) seems a minor one. The workaround is to split the protocol applications into as many Edges as are needed.
SEE ALSO
Bio::MAGETAB Bio::MAGETAB::Util::Reader Bio::MAGETAB::Util::DBLoader
AUTHOR
Tim F. Rayner <tfrayner@gmail.com>
LICENSE
This library is released under version 2 of the GNU General Public License (GPL).