NAME
Microarray - A Perl module for creating and manipulating DNA Microarray experiment objects
SYNOPSIS
use Microarray;
my $oArray = microarray->new($barcode,$data_file);
# QC filtering of our data
$oArray->set_param(min_diameter=>100,min_snr=>10,low_signal=>1000,high_signal=>62500);
$oArray->set_reporter_data;
# print plots
$oArray->print_ma_plot('/ma_plot.png',scale=>50);
# direct access to spot and clone level data
my $oData_File = $oArray->data_file; # the data_file object
my $oSpot = $oData_File->get_spots(123); # returns a single spot object
my $oReporter = $oArray->get_reporter('RP11-354D4'); # returns a single reporter object
DESCRIPTION
DNA Microarrays (http://en.wikipedia.org/wiki/Dna_microarray) also known as 'Gene Chips' or 'DNA chips', are an experimental tool used in genetic research and other related disciplines. They consist of thousands of DNA probes immobilised on a solid surface (such as a glass slide) and enable high-resolution, high-throughput analyses of a variety of parameters such as gene expression, genetic variation, or chromosome copy number variants.
Typically
A single Microarray experiment (typically) generates large quantities of data which (typically) requires some form of post-processing before the data can be interpreted or visualised. The processing of microarray data is (typically) handled by a Bioinformatician (http://en.wikipedia.org/wiki/Bioinformatics), and the favourite computer programming language of a Bioinformatician is (typically) Perl. However, until now the poor Bioinformatician has (typically) had to use a statistical programming language like R (http://www.r-project.org) - not because it is intrinsically better for the job than Perl, but rather because there were no CPAN modules that helped the Bioinformatician to perform these tasks lazily, impatiently and with hubris.
Microarray is a suite of object-oriented Perl Modules for the analysis of microarray experiment data. These include modules for handling common formats of microarray files, modules for the programmatic abstraction of a microarray experiment, and for the output of a variety of images describing microarray experiment data. Hopefully, this suite of modules will help Bioinformaticians to (typically) handle their data with laziness, impatience and hubris.
How it works
The Microarray object contains several levels of microarray associated data, organised in a (fairly) intuitive way. First, there's the data that you have obtained from a microarray scanner, in the form of a data file. This is imported into Microrray as a Data_File object. Support for different data file formats is built into the Data_File class, and creating new classes for your favourite scanner/software output is relatively simple. Data extracted from the microarray spots are then imported into individual array_spot objects. Next, replicate spots are collated into array_reporter objects. Most of the quality control functions operating on parameters such as signal intensity and spot size, are built into this final process, so that an array_reporter object only returns data from spots that have passed the QC assessments. Post-processing of the data is then performed using the Microarray::Analysis module, and finally the data are visualised using the Microarray::Image module.
METHODS
Creating microarray objects
The microarray object is created by providing a barcode (or name) and a data file. It is assumed the data file contains minimal information about the reporter identities (i.e. name or id). In the case of a CGH-microarray, that means the BAC clone name/synonym at each spot. For cDNA or oligo arrays, that would mean a gene name, cDNA accession, or oligo name. Most of the functions between initialising the objects and returning formatted data can be accessed, and default settings can be changed (see below).
Data File
The data file can be passed to Microarray either as a file name, filehandle object, or data_file object. If a filehandle is passed, the filename also needs to be set.
$oArray = microarray->new($barcode,'my_file'); # will try to guess the file format
or
$oData_File = quantarray_file->new('my_file'); # create the data file...
$oData_File = quantarray_file->new('my_file',$Fh); # can pass a filename and filehandle to the data file
$oArray = microarray->new($barcode,$oData_File); # ...then load into microarray
Data file methods
- file_name
-
Depending how you used Data_File, will be the name or the full path you provided
- get_header_info
-
For example in the ScanArray format, the data header contains information about the scan, such as laser power, PMT, etc
Reporter Identification
- blank_feature
-
Defines how 'empty' spots are described in the data file. Default 'n/a'
- prefix
-
Set to 'y' if the reporter id is prefixed in some way (for instance, we use prefixes to distinguish different methods used to prepare the same sample for microarray spotting). Default 'n'
Changing Default Settings
There are many parameters that are used for spot quality control. Below is an overview of the methods used. As well as being able to set these parameters individually, you can also set a number in one call using the set_param() method
$oArray->set_param(min_diameter=>100,min_snr=>10);
Spot Quality Control
There are various (mostly self-explanatory) methods for setting spot quality control measurements, listed below
- low_signal, high_signal
-
Defaults = 5000, 60000
- min_diameter, max_diameter
-
Default = 80, 150
- min_pixels
-
Default = 80
- signal_quality
-
Varies depending on the data file format used; for the ScanArray format, this refers to the percentage of spot pixels that are more than 2 standard deviations above the background (default = 95); for BlueFuse this corresponds to the spot confidence value.
- percen_sat
-
The method percen_sat() refers to the percentage of spot pixels that have a saturated signal. Default = 10. Not relevant to BlueFuse format.
Signal Analysis
- normalisation
-
Set to either 'y' or 'n', to include ratio normalisation. Note: this is only base-level normalisation, not signal normalisation. For CGH-microarrays, this is a subtraction of the modal log2 ratio. Default = 'y'
Access to Spot Data
All of the microarray data can be independently accessed in one of two ways. First, data can be obtained directly from the data file object, and in fact you could use this module just to simplify the data input process for your own applications and not use any of the other functions of Microarray. Individual spot objects can be returned by referring to their spot index (which is usually also the order they appear in the data file) or all spot objects can be returned as a list. See Microarray::Spot and Microarray::Reporter for more information.
my $oSpot = $oData_File->get_spots(1);
my $aAll_Spots = $oData_File->get_spots;
my $number_of_spots = $aAll_Spots[0]; # first element is not a spot, but the number of spots
my $oSpot1 = $aAll_Spots[1]; # array index = spot index
Access to Reporter Data
Alternatively you can access the reporter data, which collates replicate spot data. Either, individual reporter objects can be returned, and array_reporter methods applied to them, or all reporter objects/ids can be returned as a list.
$oReporter = $oArray->get_reporter('reporter1'); # returns a single reporter object
$aReporter_Objects = $oArray->get_reporter_objects; # returns a list of reporter objects
$aReporter_Names = $oArray->get_reporter_ids; # returns a list of reporter ids
$hReporters = $oArray->get_all_reporters; # returns a hash of reporters; key=reporter_id, value=reporter object
- set_reporter_data
-
Each Spot object is attributed to a Reporter object, and the QC process is performed on the filled Reporter objects.
- should_reject_unique
-
If you call this method before set_reporter_data(), any reporters for which only a single spot passed QC will be rejected.
Image Output
Microarray will output QC/QA plots of the data as PNG files, using the Microarray::Image::QC_Plots module. Simply call any of the following methods to create the relevant plot, passing any plot parameters if required.
$oArray->print_ma_plot($file_path,scale=>50);
Mac Os X users beware - for some unknown reason, Apple's Preview application does not render the scatter or MA plots properly.
- plot_ma
-
Plots an MA plot.
- plot_intensity_scatter
-
A simple intensity scatter of channel1 signal vs channel2 signal.
- plot_log2_heatmap
-
A spatial plot of the log2 values from each spot of the array.
- plot_intensity_heatmap
-
A spatial plot of the signal intensity of each spot of the array.
TESTING
This distribution is not yet fully tested; there are 8 test scripts that cover 14 of the 18 modules included in this distribution, although only 10 of those modules are covered in detail. However, the data files required for execution of the majority of the tests are not included in this distribution because of their size, but instead they are available for download from our Laboratory's web site at the following address;
FUTURE DEVELOPMENT
This module is under continued development for our laboratory's microarray facility. If you would like to contribute to the development of Microarray, whether to add more advanced features of data analysis, or simply to add support for other microarray platforms/scanners, please contact the author.
SEE ALSO
Microarray::File, Microarray::Reporter, Microarray::Spot, Microarray::Analysis, Microarray::Image
AUTHOR
Christopher Jones, Gynaecological Cancer Research Laboratories, UCL EGA Institute for Women's Health, University College London.
http://www.instituteforwomenshealth.ucl.ac.uk/AcademicResearch/Cancer/trl/index.html
c.jones@ucl.ac.uk
COPYRIGHT AND LICENSE
Copyright 2008 by Christopher Jones, University College London
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.