TRL::Microarray - A Perl module for creating and manipulating microarray objects
use TRL::Microarray;
my $data_file = quantarray_file->new("/test_data.csv");
my $array = cgh_array->new('my array',$data_file);
my $aaData = $array->format_cgh_data;
TRL::Microarray is an object-oriented Perl module for creating microarray data objects, and analysing the results. The module currently only supports analysis of CGH-microarrays, and the Axon 'GenePix' and Perkin-Elmer 'Scanarray' data file formats, although it has been designed with the intention of handling all types of microarrays and data file formats.
How it works
The Microarray object contains several levels of microarray associated data, organised in a (fairly) intuitive way. First, there's the data that you have obtained from a microarray scanner, in the form of a data file. This is imported into Microrray as a Data_File object. Support for different data file formats is built into the Data_File class, and creating new classes for your favourite scanner/software output is relatively simple. Data extracted from the microarray spots are then imported into individual array_spot objects. Next, replicate spots are collated into array_feature objects. Most of the quality control functions operating on parameters such as signal intensity and spot size, are built into this final process, so that an array_feature object only contains data from spots that have passed the QC assessments. Finally, sub-classes of Microarray (such as cgh_array) provide methods for adding genetic data to each feature, and also methods for basic data processing (such as returning signal ratios, or ratio normalisation).
Creating microarray objects
The microarray object is created by providing a barcode (or name) and a data file. It is assumed the data file contains minimal information about the feature identities (i.e. name or id). In the case of a CGH-microarray, that means the BAC clone name/synonym at each spot. For (currently unsupported) cDNA or oligo arrays, that would mean a gene name, cDNA accession, or oligo name. Most of the functions between initialising the objects and returning formatted data can be accessed, and default settings can be changed (see below).
Data File
The data file can be passed to Microarray either as a file name, filehandle object, or data_file object. If a filehandle is passed, the filename also needs to be set.
$array = cgh_array->new('my array','/file'); # will set the file format to the default
$data_file = quantarray_file->new('/file'); # simple way of specifying the file format...
$array = cgh_array->new('my array',$data_file); # ...and loading into microarray
$array = cgh_array->new('my array');
$array->scan_type('quantarray_file'); # another way of specifying the file format...
$array->data_file('/file'); # ...and loading into microarray
$data_file = quantarray_file->new('/file',$Fh); # can pass a filename and filehandle to the data file
$array = cgh_array->new('my array',$data_file);
Feature Identification
- genetic_data_source
Defines where Microarray can find the genetic data related to a feature. Set to either 'data_file' (default) or 'database'. For more information, see the documentation for sub classes of TRL::Microarray that deal with specific microarray platforms (for example TRL::Microarray::CGH_Microarray)
- blank_feature
Defines how 'empty' spots are described in the data file. Default 'n/a'
- prefix
Set to 'y' if the feature id is prefixed in some way (for instance, we use prefixes to distinguish different methods used to prepare the same sample for microarray spotting). Default 'n'
Changing Default Settings
There are many parameters that are used in the process of defining features, and for their quality control. Below is an overview of the methods used. As well as being able to set these parameters individually, you can also set a number in one call using the set_param() method
Spot Quality Control
There are various (mostly self-explanatory) methods for setting spot quality control measurements, listed below
- low_signal
Default = 5000
- high_signal
Default = 60000
- min_diameter
Default = 80
- max_diameter
Default = 150
- min_pixels
Default = 80
- signal_quality
Varies depending on the data file format used; for the ScanArray format, this refers to the percentage of spot pixels that are more than 2 standard deviations above the background. Default = 95
- percen_sat
The method percen_sat() refers to the percentage of spot pixels that have a saturated signal. Default = 10
Feature Analysis
- normalisation
Set to either 'y' or 'n', to include ratio normalisation. Note: this is only base-level normalisation, not signal normalisation. For CGH-microarrays, this is a subtraction of the modal log2 ratio. Default = 'y'
Formatting Data Output
The data fields that should be output are defined using the format_headers() method. Which data fields can be returned are defined by sub-classes of TRL::Microarray that deal with specific microarray platforms. For example, TRL::Microarray::CGH_Microarray provides the default fields Name, location, log2_mor and log2_rom. Additional fields are ch1, ch2, BAC name, BAC synonym, chromosome, band, start, and end. The format_data method returns a 2D-array, where the first array contains the data field headings, and subsequent arrays contain the data requested. If genetic information about a feature is obtained from a database, then a database handle must be passed to the format_data() method. Beware that features will not be returned in any specific order.
$aaFormatted_Data = $array->format_cgh_data($dbh);
Access to Spot Data
All of the microarray data can be independently accessed in one of two ways. First, data can be obtained directly from the data file object, and in fact you could use this module just to simplify the data input process for your own applications and not use any of the other functions of Microarray. Individual spot objects can be returned by referring to their spot index (which is usually also the order they appear in the data file) or all spot objects can be returned as a list. See TRL::Microarray::Spot and TRL::Microarray::Feature for more information.
my $spot = $data_file->get_spots(1);
my $aAll_Spots = $data_file->get_spots;
Data file methods
- file_name
Depending how you used Data_File, will be the name or the full path you provided
- get_header_info
For example in the ScanArray format, the data header contains information about the scan, such as laser power, PMT, etc
Access to Feature Data
Alternatively you can access the feature data, which collates replicate spot data. Either, individual feature objects can be returned, and array_feature methods applied to them, or all feature objects/ids can be returned as a list.
$feature = $array->get_feature('feature1'); # returns a single feature object
$aFeature_Objects = $array->get_feature_objects; # returns a list of feature objects
$aFeature_Names = $array->get_feature_ids; # returns a list of feature ids
$hFeatures = $array->get_all_features; # returns a hash of features; key=feature_id, value=feature object
This module is under continued development for our laboratory's microarray facility. If you would like to contribute to the development of Microarray, whether to add more advanced features of data analysis, or simply to add support for other microarray platforms/scanners, please contact the author.
TRL::Microarray::Microarray_File, TRL::Microarray::Feature, TRL::Microarray::Spot
Christopher Jones, Translational Research Laboratories, Institute for Women's Health, University College London.
Copyright 2006 by Christopher Jones, University College London
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.