NAME
MS::Reader::MzIdentML - A simple but complete mzIdentML parser
SYNOPSIS
use MS::Reader::MzIdentML;
my $idents = MS::Reader::MzIdentML->new('idents.mzIdentML');
# spectrum/peptide-level results
while (my $result = $idents->next_spectrum_result) {
# result is an MS::Reader::MzIdentML::SpectrumIdentificationResult
# object
}
# protein-level results
while (my $grp = $idents->next_protein) {
# result is an MS::Reader::MzIdentML::ProteinAmbiguityGroup
# object
}
# multi-analysis file
my $n = $idents->n_ident_lists;
for (0..$n-1) {
$idents->goto_ident_list($_);
while (my $result = $idents->next_spectrum_result) {
# result is an MS::Reader::MzIdentML::SpectrumIdentificationResult
# object
}
}
DESCRIPTION
MS::Reader::MzIdentML
is a parser for the HUPO PSI standard mzIdentML format for mass spectrometry search results. It aims to provide complete access to the data contents while not being overburdened by detailed class infrastructure. Convenience methods are provided for accessing commonly used data. Users who want to extract data not accessible through the available methods should examine the data structure of the parsed object. The dump()
method of MS::Reader::XML, from which this class inherits, provides an easy method of doing so.
Currently this module is only semi-complete. The parsing routines are functional, but there is a lack of direct access to much of the data, requiring traversal of the underlying data structure. Hopefully this situation will improve in the future.
INHERITANCE
MS::Reader::MzIdentML
is a subclass of MS::Reader::XML, which in turn inherits from MS::Reader, and inherits the methods of these parental classes. Please see the documentation for those classes for details of available methods not detailed below.
METHODS
new
my $idents = MS::Reader::MzIdentML->new( $fn,
use_cache => 0,
paranoid => 0,
);
Takes an input filename (required) and optional argument hash and returns an MS::Reader::MzIdentML
object. This constructor is inherited directly from MS::Reader. Available options include:
use_cache — cache fetched records in memory for repeat access (default: FALSE)
paranoid — when loading index from disk, recalculates MD5 checksum each time to make sure raw file hasn't changed. This adds (typically) a few seconds to load times. By default, only file size and mtime are checked.
next_spectrum_result
while (my $r = $idents->next_spectrum_result) {
# do something
}
Returns an MS::Reader::MzIdentML::SpectrumIdentificationResult
object representing the next spectrum query in the file, or undef
if the end of records has been reached. Typically used to iterate over each search query in the run.
fetch_spectrum_result
my $r = $idents->fetch_spectrum_result($idx);
Takes a single argument (zero-based result index) and returns an MS::Reader::MzIdentML::SpectrumIdentificationResult
object representing the result at that index. Throws an exception if the index is out of range.
next_protein_group
while (my $g = $idents->next_protein_group) {
# do something
}
Returns an MS::Reader::MzIdentML::ProteinAmbiguityGroup
object representing the next protein group result in the file, or undef
if the end of records has been reached. Typically used to iterate over each protein group in the run.
fetch_protein_group
my $g = $idents->fetch_protein_group($idx);
Takes a single argument (zero-based result index) and returns an MS::Reader::MzIdentML::ProteinAmbiguityGroup
object representing the protein group at that index. Throws an exception if the index is out of range.
goto_ident_list
$idents->goto_ident_list($idx);
Takes a single argument (zero-based list index) and sets the current spectrum result list to that index (for subsequent calls to next_spectrum_result
).
n_ident_lists
my $n = $idents->n_ident_lists;
Returns the number of spectrum identification lists in the file.
fetch_dbsequence_by_id
my $seq = $idents->fetch_dbsequence_by_id( $seq_id );
Given a DBSequence element ID, returns the corresponding MS::Reader::MzIdentML::DBSequence object.
fetch_peptide_by_id
my $pep = $idents->fetch_peptide_by_id( $pep_id );
Given a Peptide element ID, returns the corresponding MS::Reader::MzIdentML::Peptide object.
fetch_peptideevidence_by_id
my $pe = $idents->fetch_peptideevidence_by_id( $pe_id );
Given a PeptideEvidence element ID, returns the corresponding MS::Reader::MzIdentML::PeptideEvidence object.
raw_file
my $fn = $idents->raw_file($id);
Takes a single argument (ID of raw source) and returns the path on disk to the raw file (as recorded in the mzIdentML).
CAVEATS AND BUGS
The API is in alpha stage and is not guaranteed to be stable.
Please reports bugs or feature requests through the issue tracker at https://github.com/jvolkening/p5-MS/issues.
AUTHOR
Jeremy Volkening <jdv@base2bio.com>
COPYRIGHT AND LICENSE
Copyright 2015-2016 Jeremy Volkening
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.