NAME
idr2pept.pl - Extraction of reliable peptide/spectrum matches from Phenyx .idr.xml files
SYNOPSIS
xml2pept.pl [options] idr.xml files
OPTIONS
Use idr2pept.pl -h
DESCRIPTION
The script parses one or several Phenyx .idr.xml files to extract reliable peptide/spectrum matches and outputs them in the .peptSpectra.xml format. The .idr.xml file(s) can be compressed (gzipped) files.
The selection of the peptide assignments is performed based on several thresholds applied to identifications found in the idr.xml file(s):
- maximum peptide p-value
- minimum peptide score
- minimum peptide z-score
- minimum protein score
- minimum number of distinct peptides per protein
- minimum peptide save z-score
To be selected a peptide must have a p-value smaller than the maximum peptide p-value, score and z-score larger than the minimum peptide score and z-score respectively, and a minimum number of distinct peptides satisfying the latter criteria must match a given protein entry in the database. In case less than the minimum number of distinct peptides is found for a protein, then all the ones having a z-score higher than the minimum save z-score are nonetheless selected.
During the parsing of the file, each spectrum is associated with the peptide that gives the best match. That is, all multiple interpretations of a spectrum are lost in favor of the best one.
It is possible to restrict the exported peptides to an imposed charge state. All the peptides participate in the selection (criterion on the number of distinct peptides per protein), but only the ones having the imposed charge are printed in the .peptSpectra.xml output.
It is also possible to give a fasta file containing a list of protein sequences that are known to be in the analyzed sample. In this case, an additional condition for a peptide to be selected is that it appears in one of the given sequences. This option is useful when analyzing mixtures of purified proteins for quality control or any other purpose. It allows to work with released thresholds to increase sensitivity by maintaining high confidence in the selected peptide/spectrum matches.
Finally, a list of database names can be provided to the script if the original search .idr.xml files contained results found in several databases.
EXAMPLE
./idr2pept.pl example.idr.xml > test.peptSpectra.xml
AUTHOR
Jacques Colinge