NAME
MSMSOutput - An object implementing common display/output methods for masses
SYNOPSIS
use MSMSOutput;
DESCRIPTION
MSMSOutput Perl object is intended to support common display and output methods for masses as obtained by mass spectrometry-related computations.
It is released under the LGPL license (see source code).
ATTRIBUTES
- spectrum
-
A reference to a hash such as computed by MassCalculator::getFragmentMasses or an object of class MSMSTheoSpectrum.
- expSpectrum
-
A reference to an experimental spectrum such as required by MassCalculator::matchClosest or an object of class ExpSpectrum. When this parameter is specified the constructor will assume that the hash spectrum contains data about the match with this experimental spectrum.
- massIndex
-
The mass index in the experimental peak vectors, default 0. If expSpectrum parameter is an ExpSpectrum object this index is read from the object directly.
- intensityIndex
-
The intensity index in the experimental peak vectors, default 1. If expSpectrum parameter is an ExpSpectrum object this index is read from the object directly.
- tol
-
Relative mass error tolerance; this parameter is optional. When not specified, the matched masses found by the match algorithm are all preserved. When specified, the new tolerance is applied.
This parameter is mainly useful for match obtained via matchSpectrumClosest that does not apply any mass tolerance.
- minTol
-
Absolute mass error, default value 0.2 Da. This parameter is used only in case tol parameter is specified, see above.
- intSel
-
This parameter controls how the peak intensities are normalized, see function normalizeIntensities.
Parameter intSel is used provided expSpectrum was set.
- prec
-
The number of digits after the decimal points for the masses. Default precision is 3 digits.
- modifLvl
-
Controls how the modifications are highlighted in the vector splitPept defined below, see also function annotatePept.
- cmp
-
This parameter is a reference to a comparison function used for sorting fragment names. If cmp is not set, the function cmpFragTypes is used instead.
METHODS
new(%h|$MSMSOutput)
Constructor. %h is a hash of attribute=>value pairs and $MSMSOutput is a InSilicoSpectro::InSilico::MSMSOutput object, from which the attributes are copied.
To prepare for actual output - through specialized methods - the constructor builds a dedicated data structure. In case users want to create new methods via inheritance or code modification, we describe hereafter this data structure:
my $table = new InSilicoSpectro::InSilico::MSMSOutput(...);
$table->{peptideMass} is the precursor peptide mass.
$table->{peptide} is the precursor peptide sequence.
$table->{modif} is the precursor peptide modification string.
$table->{splitPept} is a reference to a vector of the same length
as the peptide sequence that contains each
amino acid with annotated modifications (see
parameter modifLvl above).
$table->{intSel} is the value of the intSel parameter.
$table->{mass}{term} contains the terminal fragments.
$table->{mass}{intern} contains the internal fragments.
$table->{mass}{term}[i][0] contains the name of the ith fragment type.
$table->{mass}{term}[i][j] contains the mass of the jth fragment of type i.
$table->{mass}{intern}[i][0] contains the name of the ith fragment type
$table->{mass}{intern}[i][j,j+1] contains a description of the internal
fragment followed by its mass, j>0.
$table->{match} has the same structure as $table->{mass} but it
contains the matched experimental masses. How the
masses are matched depends on the match function
that was called.
$table->{intens} has the same structure as $table->{match} but it
contains the normalized intensities of the matched
experimental peaks.
See also the code of the method tabSepSpectrum for a simple example of how this data structure can be used.
tabSepSpectrum($nColIntern)
This method returns a string containing a tab-separated tabular representation of the theoretical spectrum. Matched masses, if present, are ignored.
As it is certainly more appropriate to instantiate the object with modifLvl set to 1 (or 0) before calling this method, we also include in the output table a string giving the peptide modifications as obtained with modifLvl set to 2. Peptide mass is included as well.
The string computed by tabSepSpectrum is appropriate for loading in a spread sheet or is usable as an intermediary format for a custom output format. For the latter reason, we try to make it simple to parse and, in particular, we add a 'TERMINAL' tag at the beginning of the N-/C-terminal fragment masses and an 'INTERNAL' tag at the beginning the internal ones. Moreover, if match data are available, the matched theoretical masses are followed by the matched experimental masses and intensities in parentheses (should be easy to read and parse via elementary regular expressions).
The only parameter is:
- $nColIntern
-
Number of groups of 3 columns in the second table for the internal fragments. Default is 2.
Example:
my $msms = new InSilicoSpectro::InSilico::MSMSOutput(...);
print $msms->tabSepSpectrum();
latexSpectrum($nColIntern)
This method returns a simple latex table in a string containing a tabular representation of the tabular structure generated by tabSpectrum. This table should be fairly easy to edit afterwards to meet specific style requirements. Matched masses, if present, are ignored.
Internal fragments (only immonium ions for the time being) are output in a separated table since their number is different from the peptide length.
The only parameter is:
- $nColIntern
-
Number of groups of 3 columns in the second table for the internal fragments. Default is 2.
Example:
my $msms = new InSilicoSpectro::InSilico::MSMSOutput(...);
print $msms->latexSpectrum(3);
htmlTerm(%h)
This method returns a string containing the lines of an HTML table representing a tabular structure such as generated by tabSpectrum; only the N-/C-terminal fragments are considered, see the sister function htmlIntern for the internal fragments.
Since this method is susceptible to be used for generating HTML pages automatically, we give the user some flexibility to change the aspect of the output table (manual editing is not an option). Moreover, the <table> tag is not included in the returned string such that you can choose the table styles you want.
The named parameters are:
- colLineFunc
-
A reference to a function aimed at changing the line colors in the table to make it more readable. This package export two functions for this purpose: chooseColorLineNum and chooseColorFrag (see their respective descriptions).
You can define your own function if you need another logic. Such a function has four parameters: fragment type for the current line, fragment type of the previous line, a reference to color 1 and another to color 2 to exchange them.
The default function is chooseColorFrag.
- css
-
If css is defined then CSS are used instead of old fashioned in situ color and font specifications. See function htmlCSS.
- lineCol1, lineCol2
-
The two colors used for the lines, default colors are '#DDFFFF' and '#EEEEEE'.
- boldTitle
-
Peptide sequence in bold if set to any value.
- bgTitle
-
Background color for the peptide sequence, default '#CCFFCC'.
- boldFrag
-
Fragment names in bold if set to any value.
- bgFrag
-
Background color for the fragment names, default '#FFFFBB'.
Example :
my $msms = new InSilicoSpectro::InSilico::MSMSOutput(...);
print "<html><head></head><body><table border=0 cellspacing=5>\n";
print "\n",$msms->htmlTerm(boldTitle=>1, bgFrag=>'#FFFFBB', bgTitle=>'#99CCFF',
colLineFunc=>\&chooseColorFrag);
print "</table></html>\n";
htmlIntern(%h)
This method returns a string containing the lines of an HTML table representing a tabular structure such as generated by tabSpectrum; only internal fragments are considered, see the sister function htmlTerm for the N-/C-terminal fragments.
Since this method is susceptible to be used for generating HTML pages automatically, we give the user some flexibility to change the aspect of the output table (manual editing is not an option). Moreover, the <table> tag is not included in the returned string such that you can choose the table styles you want.
The named parameters are:
- css
-
If css is defined then CSS are used instead of old fashioned in situ color and font specifications. See function htmlCSS.
- bgIntern
-
The color used for the lines, default '#EEEEEE'.
- boldTitle
-
Column titles in bold if set to any value.
- bgTitle
-
Background color for the column titles, default '#CCFFCC'.
- boldFrag
-
Fragment names in bold if set to any value.
- bgFrag
-
Background color for the fragment names, default '#FFFFBB'.
- nColIntern
-
Number of groups of 3 columns in the second table for the internal fragments. Default is 2.
Example:
my $msms = new InSilicoSpectro::InSilico::MSMSOutput(...);
print "<table border=0 cellspacing=5>\n";
print "\n",$msms->htmlIntern(boldTitle=>1);
print "</table>\n";
plotSpectrumMatch(%h)
This method generates images to represent matches between theoretical and experimental spectra. Such images are intended to be used in user interface, typically web interfaces. To fit rather diverse requirements, a great number of parameters can be set to change colors and aspects of the plots.
The named parameters are:
- fname
-
The file name of the generated image.
- fhandle
-
An open file handle for writing the generated image. It has priority over parameter fname and the file handle will be set in binmode.
- format
-
The graphic file format. If not specified, the function will return the image object for further processing (see GD documentation). The supported file formats are the ones of GD.
- fontChoice
-
The size of the graphics is controlled via the choice of the font. The fontChoics parameter is a string 'class:size', where class selects the type of font and size its size.
The GD native fonts are selected by setting class equal to 'default'. The size the 'default' class must be one of 'Tiny', 'Small', 'MediumBold', 'Large', or 'Giant'. Default font is 'default:Large'.
Alternatively, it is possible give the name of a file containing the definition of a TrueType font for the class (absolute path) and size is the point size.
- inCellBorder
-
Number of pixels between lines and text, default 1.
- style
-
Two styles are supported for the match graphics: 'circle' and 'square'. Default is 'circle' except when modifLvl was 2 in tabSpectrum, where it is 'square'.
- plotIntern
-
If this parameter is set to any value, and at least one internal fragment mass exists, the internal fragments are represented in the graphics.
- nColIntern
-
Number of column to display internal fragments, default is 2.
- colorScale
-
This parameter is used for defining a list of intensities thresholds and corresponding colors used when highlighting the table cells to indicate fragment matches. Thresholds must be in increasing order of intensities.
colorScale is a reference to a vector of values, each threshold is associated with 8 values in the following order:
- threshold value
- red intensity (cell color)
- green intensity (cell color)
- blue intensity (cell color)
- legend text
- red intensity (legend text color)
- green intensity (legend text color)
- blue intensity (legend text color)
These eight data are repeated for each threshold and the number of threshold is not limited. The threshold values must be adapted to intensity normalization (see function tabSpectrum).
By default, plotSpectrumMatch generates a color scale that adapts to the normalization and contains 5 bins: blue (less intense), red, orange, yellow, green (most intense).
- legend
-
When this parameter is set to 'right', a legend is added at the right of the graphics. When it is set to 'bottom', a legend is added under the graphics.
The legend is made of the color scale and a count number of matched peaks versus number of experimental peaks in each intensity bin. This count informs on the quality of the match. It is important to note that it is not uncommon for an experimental peak to match several theoretical masses and therefore the count, which considers each mass once, may be slightly different from what is read from the graphics. The present two different point of views: theoretical and experimental masses point of views.
- changeColModifAA
-
Except when tabSpectrum was called with modifLvl equal to 2, plotSpectrumMatch displays one character per amino acid only, i.e. the asterisk indicating the presence of a modification is suppressed. When changeColModifAA is set to any value, plotSpectrumMatch display the modified amino acids in another color. If not set, the modified amino acids are over-lined.
- modifAAColor
-
A reference to a vector of three values (R, G, B) used to defined the color for modified amino acids, default blue.
- bgColor
-
A reference to a vector of three values (R, G, B) used to defined the graphics background color, default white.
- textColor
-
A reference to a vector of three values (R, G, B) used to defined the text color, default black.
- lineColor
-
A reference to a vector of three values (R, G, B) used to defined the line color, default black.
Example:
my $msms = new InSilicoSpectro::InSilico::MSMSOutput(spectrum=>\%spectrum, prec=>2, modifLvl=>1,
expSpectrum=>\@peaks, intSel=>'order', tol=>$tol, minTol=>$minTol);
$msms->plotSpectrumMatch(fname=>$peptide, format=>'png', fontChoice=>'default:Large',
changeColModifAA=>1, legend=>'bottom');
FUNCTIONS
cmpFragTypes
This function can be used in a sort of fragment type names. Fragment type names are assumed to follow the rule:
- internal fragments
-
They are named after their generic name, only immonium ions are supported so far and they are named 'immo'.
- N-/C-terminal fragments
-
They must comply with the pattern
ion&charge - loss1 -loss2 - ...
For instance, singly charged b ions are simply named 'b' and their doubly and triply counterparts are names 'b++' and 'b+++'. This is the ion&charge part of the pattern above.
The losses may occur once or several times, multiple losses are indicated in parentheses preceeded by multiplicity. Examples are:
b-H2O b-3(H2O) b++-H2O-NH3 b++-3(H2O)-NH3 y-H2O-2(H3PO4)-NH3
The order on fragment type names is defined as follows: (1) immonium ions always come after N-/C-terminal fragments; (2) N-/C-terminal fragment types are compared by doing a sequence of comparisons which continues as long as the compared values are equal. The first comparison is on the ion type (a,b,y,...) followed by a comparison on the charge. If ion types and charges are equal, comparisons are made on the losses. The fragment that has less loss types is considered smaller. If the two fragment types have the same number of loss types then the losses are sorted lexicographically and the first ones are compared on their name, if the names are the same then the comparison is on the multiplicity, if the multiplicities are the same then the second losses are compared, etc.
Asterisks that are used for signaling multiple possible losses are ignored in the comparisons.
Since this function is defined in package MSMSOutput and it is used in other packages with function sort (and predefined variables $a and $b), we had to use prototypes ($$). Therefore it can no longer be exported by the package MSMSOutput and you have to call it via MSMSOutput::cmpFragTypes.
Example:
foreach (sort MSMSOutput::cmpFragTypes ('y','b','y++','a','b-NH3','b-2(NH3)','b++-10(NH3)','b-H2O-NH3','immo(Y)', 'b++','y-NH3*','y-H2O*','z')){ print $_,"\n"; }
annotatePept($pept, $modif, $modifLvl)
Returns a vector whose cells contain each amino acid of the peptide sequence annotated with their eventual modifi- cations.
This function is exported for allowing users to prepare peptide sequences for display purposes. The parameters are:
- $pept
-
The peptide sequence.
- $modif
-
The modification string or the modification vector.
- $modifLvl
-
Controls how the modifications are highlighted in the returned vector.
If not set or set to 0, this parameter causes the modified amino acids not to be indicated. If set to 1, the modified amino acids are marked by an asterisk. If set to 2, the modified amino acids are followed by the name of the modification between curly brackets.
Example:
print join('', annotatePept('ACCTK', '::Cys_CAM:Cys_CAM:::', 2)), "\n";
normalizeIntensities($inSel, $expSpectrum, $normInt, [$massIndex, [$intensityIndex]])
Normalizes experimental peaks intensities. The parameters are:
- $intSel
-
This parameter controls how the peak intensities are normalized. Default choice is 'order' for relative order; other possible choices are 'relative' for relative intensity, 'original' for no normalization, and 'log' for logarithmic transform.
- $expSpectrum
-
The experimental spectrum.
- $normInt
-
A reference to a hash that will contain the normalized intensities (keys are the original intensities).
- massIndex
-
The mass index in the experimental peak vectors, default 0. If expSpectrum parameter is an ExpSpectrum object this index is read from the object directly.
- intensityIndex
-
The intensity index in the experimental peak vectors, default 1. If expSpectrum parameter is an ExpSpectrum object this index is read from the object directly.
htmlCSS(%h)
This function returns a string that can be used for defining a CSS, which is then used by the tables created by functions htmlTerm and htmlIntern. To give you more flexibility, we do not include the <style> tags in the string such that you can add the styles returned by htmlCSS where you like.
Alternatively, you can choose not to use this function and define totally different styles!
The named parameters are:
- lineCol1, lineCol2
-
The two colors used for the lines, default colors are '#DDFFFF' and '#EEEEEE'.
- boldTitle
-
Peptide sequence in bold if set to any value.
- bgTitle
-
Background color for the peptide sequence, default '#CCFFCC'.
- boldFrag
-
Fragment names in bold if set to any value.
- bgFrag
-
Background color for the fragment names, default '#FFFFBB'.
- bgIntern
-
The color used for the lines in the internal fragments table, default '#EEEEEE'.
Example :
my $msms = new InSilicoSpectro::InSilico::MSMSOutput(...);
print "<html>\n<head>\n<style type=\"text/css\">\n";
print InSilicoSpectro::InSilico::MSMSOutput::htmlCSS(boldTitle=>1);
print "</style>\n</head>\n<body><table border=0 cellspacing=5>\n";
print "\n",$msms->htmlTerm(css=>1);
print "</table><br><br><table border=0 cellspacing=5>\n";
print "\n",$msms->htmlIntern(css=>1);
print "</table></html>\n";
chooseColorLineNum
Function for HTML output that alternates the line colors for every line.
chooseColorFrag
Function for HTML output that changes the line color when the type of fragment changes; b-H2O and b-2(H2O) are considered the same type by this function.
plotLegendOnly(%h)
This function plots the color scale only and should be used if you don not want to display it for each match plot. Note that the legend generated by PlotSpectrumMatch contains extra information that is specific to the match, i.e. the count of matched peaks per intensity bin. This information is not reported if you decide to save space and only display the color scale once.
The named parameters are (see plotSpectrumMatch for detailed explanations):
- fname
-
The file name of the generated image.
- fhandle
-
An open file handle for writing the generated image. It has priority over parameter fname and the file handle will be set in binmode.
- format
-
The graphic file format.
- fontChoice
-
The size of the graphics is controlled via the choice of the font.
- inCellBorder
-
Number of pixels between lines and text, default 1.
- colorScale
-
This parameter is used for defining a list of intensities thresholds and corresponding colors used when highlighting the table cells to indicate fragment matches.
- lineColor
-
A reference to a vector of three values (R, G, B) used to defined the line color, default black.
- intSel
-
In case no user-defined color scale is provided, a default color scale is used instead. To properly adjust this scale to the intensity normalization method it is important to indicate via parameter intSel which is this normalization. Possible values are listed in function normalizeIntensities.
EXAMPLES
See programs starting with testMSMSOut in folder InSilicoSpectro/InSilico/test/.
AUTHORS
Jacques Colinge, Upper Austria University of Applied Science at Hagenberg