Name
Microarray::DataMatrix - abstraction to matrices of microarray data
Abstract and Overall Logic
Note : This documentation is for Developers only. Clients of concrete subclasses of this package should have no need to consult this documentation, as the API for those subclasses should be fully documented as part of those subclasses.
dataMatrix provides an abstract superclass for a collection of abstract classes pertaining to dealing with matrices. Only in the context of the those other classes is baseDataMatrix useful and meaningful. baseDataMatrix itself provides protected methods for certain primitive operations that can be used by its subclasses, and public methods for which it is required that its immediate subclasses have the same underlying structure to deal with their dataMatrix, such as which rows and columns have not yet been filtered out.
The collection of classes are structured like this:
dataMatrix
/\
/ \
/ \
ISA / \ ISA
/ \
/ \
/ \
smallDataMatrix bigDataMatrix
\ /
\ /
\ /
\ /
CanBeA \ / CanBeA
\ /
\ /
\ /
anySizeDataMatrix
|
| ISA
|
------------------ - - - - - -
| | |
| | |
| | |
concreteClassA concreteClassB concreteClassX
anySizeDataMatrix provides an abstraction to a dataMatrix whose contents may or may not fit into memory. An object will inherit dynamically, at construction time, from either small- or bigDataMatrix, which know how to deal with a matrix of a particular size. anySizeDataMatrix itself is an abstract class, and will be subclassed by concrete classes dealing with a particular file type of data, which they know how to parse, for example a pclFile. Because development of dataMatrix, smallDataMatrix, bigDataMatrix and anySizeDataMatrix was done as a collection of classes, they are somewhat more intimate with each other than say a concrete subclass of anySizeDataMatrix would be with anySizeDataMatrix itself. While the subclasses do stick to the API, and respect the privacy of attributes and methods, the API was developed simultaneously with the subclasses that were using it. Thus it may not be the cleanest API in the world.....
This collection of classes tries to follow the rules that all attributes are preceded by the "$PACKAGE::", in the objects hash. Private attribute names and private methods are preceded by two underscores, protected attributes and protected methods (which can be accessed by subclasses, as well as in $PACKAGE itself) are preceded by a single underscore. Public attributes and methods (which can be accessed anywhere) have no preceding underscores. In actuality, all object attributes are (and should be private). If there is a need for either subclasses or clients to manipulate or access them, then there are provided protected and public methods respectively, for setting or getting the values of the attributes. Disobey this interface at your peril!!!!
Protected Setter/Mutator Methods
_setAutoDump
This protected method is used to set the autodump flag, which can be either 1 or 0. This should only be utilized by subclasses, not clients.
Usage:
$self->_setAutoDump(1);
_setValidColumns
This protected setter method receives a reference to a hash, which has as its keys the indexes of the columns in the matrix which are valid. This method MUST be used when the matrix has been first read, to set up all the columns which are initially valid (this call will actually occur in the _init methods (or methods called by them) of big- and smallDataMatrix). The values of the hash will usually be undef, to simply save space. There is no expectation for them to be otherwise.
Usage:
$self->_setValidColumns(\%validColumns);
_setValidRows
This protected setter method receives a reference to a hash, which has as its keys the indexes of the rows in the matrix which are valid. This method is expected to only be used when the matrix has been first read, to set up all the rows which are initially valid (this call will actually occur in the _init methods (or methods called by them) of big- and smallDataMatrix). The values of the hash will usually be undef, to simply save space. There is no expectation for them to be otherwise.
Usage:
$self->_setValidRows(\%validRows);
_setErrstr
This protected setter method accepts a scalar, that will correspond to an error that has occurred, and will store it within the object.
Usage:
$self->_setErrstr($error);
_invalidateMatrixRow
This protected mutator method makes a row invalid. This method is not undoable, because the invalidation also deletes the data for the row. Note that the row index MUST correspond to the index of that row in the original file, not whatever row it may currently be (ie if rows 1 and 2 were filtered out, row 3 should still be called row 3 when being invalidated, not row 1).
Usage :
$self->_invalidateMatrixRow($row);
_invalidateMatrixColumn
This protected mutator method makes a column invalid. Note that the column index MUST correspond to the index of that column in the original file, not whatever column it may currently be (ie if columns 1 and 2 were filtered out, column 3 should still be called column 3 when being invalidated, not column 1).
Usage :
$self->_invalidateMatrixColumn($column);
PROTECTED GETTER/ACCESSOR METHODS
_autoDump
This protected method returns a boolean to indicate whether autodumping is enabled.
Usage:
if ($self->_autoDump){
# blah
}
_validRowsArrayRef
This protected accessor returns a reference to an array that contains the indexes of all the valid rows
Usage:
foreach my $row (@{$self->_validRowsArrayRef}){
# do something useful
}
_validColumnArrayRef
This protected accessor returns a reference to an array that contains the indexes of all the valid columns.
Usage:
foreach my $column (@{$self->_validColumnsArrayRef}){
# do something useful
}
_matrixRowIsValid
This protected accessor returns a boolean to indicate whether a given row in the data matrix is still valid (ie has not been filtered out). The row index is with respect to its index in the original file that was used to construct the object.
Usage :
if ($self->_matrixRowIsValid($row)){ # blah }
_matrixColumnIsValid
This protected accessor returns a boolean to indicate whether a given column in the data matrix is still valid (ie has not been filtered out). The column index is with respect to its index in the original file that was used to construct the object.
Usage :
if ($self->_matrixColumnIsValid($column)){ # blah }
_numColumnsToReport
This protected method returns the number of columns to process after which reporting should be done, if verbose reporting has been indicated. If no value has been set, then the default of 50 is returned.
Usage :
my $numColumnsToReport = $self->_numColumnsToReport;
_numRowsToReport
This protected method returns the number of rows to process after which reporting should be done, if verbose reporting has been indicated. If no value has been set, then the default of 5000 is returned.
Usage :
my $numRowsToReport = $self->_numRowsToReport;
_lineEnding
This protected method returns the appropriate line ending, for text or html reporting. It expects a string, either 'html' or 'text' and will return the appropriate line ending.
Usage:
my $lineEnding = $self->_lineEnding("text");
_centeringMethodIsAllowed
This protected method returns a boolean to indicate whether a centering method is allowed. Allowed methods are 'mean' and 'median'.
Usage :
if ($self->_centeringMethodIsAllowed($method)){ # blah }
_operatorIsAllowed
This protected method returns a boolean to indicate whether a particular operator is allowed. For each operator, there exists a corresponding method that uses that operator. Such operators are used when filtering rows by there values, eg >, or < etc.
Usage :
if ($self->_operatorIsAllowed($operator)){ # blah }
_methodForOperator
This protected method returns the name of the method that is used to compare two values, based on the operator that was passed in.
Usage :
my $method = $self->_methodForOperator($operator);
PROTECTED UTILITY METHODS
_rowAverage
This method returns the average of the valid entries in a row, using either the mean or the median, depending on the requested method. The row is passed in as a reference to an array containing the values for the row. If no mean/median could be calculated, then the method returns undef. Only values at validRowIndexes within the passed in array are used in the calculation.
Usage:
my $average = $self->_rowAverage(\@row, "mean");
_average
This method calculates either the mean or median of a set of data, by receiving the total number of datapoints, an array by reference of all the datavalues, and the sum total of all the datapoints. The former is required to calculate the median (and is not assumed to be sorted), the latter to calculate the mean. The method must also be passed in. The number of datapoints must be non-zero.
Usage:
my $average = $self->_average("mean", \@data, $total, $numDatapoints);
_centerRow
This protected method takes an array reference to a row, and the average (either mean or median, depending on what was requested), and subtracts that value from every valid value (ie for the valid column indexes) in the row.
Usage:
$self->_centerRow(\@row, $average);
_calculateMeansAndStdDeviations
This method expects to receive hashes of the sums of X, the sums of X squared and the number of datapoints, where the keys for each hash are the unique identifiers for a series of numbers, whose mean and standard deviations are to be calculated. It returns references to hashes that hash the same ids to the means and standard deviations. It uses the n-1 version of standard deviation. If a standard deviation cannot be calculated, it will be stored as undef.
Usage:
my ($stddevHashRef, $meansHashRef) = $self->_calculateMeansAndStdDeviations(\%sumOfX, \%sumX2, \%numDataPoints);
_calculateBounds
This method receives two hashes by reference. One is a hash of means, the other a hash of std deviations. It also receives a multiplier. It then calculates, and returns as hash references, the upper and lower bounds for the mean plus or minus that number of deviations. It also receives what line ending it should be using, if being verbose in its reporting.
Usage:
my ($upperHashRef, $lowerHashRef) = $self->_calculateBounds($stddevHashRef, $meansHashRef, $deviations, $lineEnding);
_giveOverrideMessage
This protected utility method can be used by any subclass that expects its own subclasses to implement certain methods. It can have stub methods, that simply call this method, which will give a standard error message saying that the class 'X' must override method Y.
Usage:
$self->_giveOverrideMessage();
PUBLIC ACCESSOR METHODS
allowedOperators
This public method returns a sorted array of all the allowed operators that may be used by methods (in subclasses) that employ the operators for whatever reason (their interface should indicate that they employ such operators).
Usage :
my @operators = $matrix->allowedOperators;
PUBLIC SETTER METHODS
setNumColumnsToReport
This method accepts a positive integer, that indicates the number of columns that have been processed during a filtering/transformation method that is carried out on a column basis, after which progress should be indicated. If a client has not set this value, then it defaults to 50.
Usage :
$matrix->setNumColumnsToReport(50);
setNumRowsToReport
This method accepts a positive integer, that indicates the number of rows that have been processed during a filtering/transformation method that is carried out on a row basis, after which progress should be indicated. If a client has not set this value, then it defaults to 5000.
Usage :
$matrix->setNumRowsToReport(5000);
AUTHOR
Gavin Sherlock
sherlock@genome.stanford.edu