NAME

MS::CV - interface to HUPO PSI controlled vocabularies

SYNOPSIS

use MS::CV qw/:MS :MOD :MI is_a regex_for units_for/;

# use PSI terms directly as constants
if ('MS:1000894' eq MS_RETENTION_TIME) {
    # do something
}

# check for child/parent relationships
say "model param is valid!"
    if (is_a( MS_Q_TRAP, MS_INSTRUMENT_MODEL ));


# PSI:MS conveniently provides cleavage regular expressions
my $pep = 'PEPTIDERPEPTIDEKRAPPLE';
my $re  = regex_for(MS_TRYPSIN);
say $_ for split( $re, $pep );

DESCRIPTION

MS::CV provides a simple interface to the HUPO PSI controlled vocabularies. Currently the MS, MOD, and MI ontologies are indexed and available.

The module utilizes a functional interface for speed and simplicity. It's primarily functionality is to export sets of constants (one for each ontology) directly mapping the term names to ids.

CONSTANT NAMING

Constant names are autogenerated from the name field of the ontology OBO files. The rules for mapping are defined in the following code:

my $symb = uc( $ontology . '_' . $term->{name} );
$symb =~ s/\W/_/g;
$symb =~ s/^(\d)/_$1/;

For example, the term "CRM spectrum" in the MS ontology becomes MS_CRM_SPECTRUM.

In addition, very rarely there are namespace collisions between terms after applying these transformations. In this case, increasing integer suffixes are appended to each colliding term. As of this writing, this only occurs for the following terms:

MI_TEXT_MINING

MI:0110 ("text mining") becomes MI_TEXT_MINING_1

MI:1056 ("text-mining") becomes MI_TEXT_MINING_2

UO_MILLI

UO:0000297 ("milli") becomes UO_MILLI_1

UO:0010009 ("milli") becomes UO_MILLI_2

UO_RATIO

UO:0000190 ("ratio") becomes UO_RATIO_1

UO:0010006 ("ratio") becomes UO_RATIO_2

In rare cases, we have chosen to override the above suffixing for colliding terms. As of this writing, the following terms are overridden:

MS_M_H_ION

MS:1002820 ("M+H ion") becomes M_PLUS_H_ION

MS:1002821 ("M-H ion") becomes M_MINUS_H_ION

FUNCTIONS

is_a

if ( is_a( $child, $parent ) ) {
    say "model param is valid!";
}

Takes two required arguments (child ID and parent ID) and returns a boolean value indicating whether the first term is a descendant of the second.

units_for

my $valid_units = units_for( $term );

Takes one argument (a CV ID) and returns a reference to an array of valid unit terms from the Unit Ontology, or undef if no units are defined.

regex_for

my $re = regex_for(MS_TRYPSIN);
say $ for split( $re, $peptide );

Takes one argument (a CV ID representing a cleavage enzyme) and returns a regular expression that can be used to split a string based on the specificity of that enzyme.

cv_name

my $name = cv_name( $term );

Takes one argument (a CV ID) and returns the text description of the term

print_tree ( 'MS' );

Takes one argument (a CV name) and prints a textual tree representation of the CV hierarchy to STDOUT (or the currently selected output filehandle). This is mainly of use for debugging the use of CV terms in your program, as it includes the constant name exported by this module for each term in the CV.

CAVEATS AND BUGS

Please report any bugs or feature requests to the issue tracker at https://github.com/jvolkening/p5-MS.

AUTHOR

Jeremy Volkening <jdv@base2bio.com>

COPYRIGHT AND LICENSE

Copyright 2016-2017 Jeremy Volkening

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.