NAME
obogaf::parser - a perl5 module to handle obo and gaf file
SYNOPSIS
use obogaf::parser;
my ($graph, $subonto, $res, $stat, $parORchdlist, $newobo);
$graph= build_edges(obofile);
$subonto= build_subonto(edgesfile, namespace);
$stat= make_stat(edgesfile, parentIndex, childIndex);
$parORchdlist= get_parents_or_children_list(edgesfile, parentIndex, childIndex, parORchd);
$newobo= obo_filter(obofile, termsfile);
($res, $stat)= gene2biofun(annfile, geneIndex, classIndex);
($res, $stat)= map_OBOterm_between_release(obofile, annfile, classIndex);
ABSTRACT
obogaf::parser is a perl5 module desinged to handle open biological and biomedical ontology and gene association file.
DESCRIPTION
obogaf::parser is a perl5 module specifically designed to handle GO and HPO obo (Open Biological and Biomedical Ontology) file and their Gene Annotation File (gaf file). However, all the obogaf::parser subroutines can be safely used to parse any obo file listed in OBO foundry and any gene annotation file structured as those shown in GOA website and HPO website (basically a csv file using tab as separator).
- build_edges - extract edges from an obo file.
-
my $graph= build_edges(obofile);
obofile: any obo file listed in OBO foundry. The file extension must be ".obo".
output: the graph is returned as tuple:
subdomain <tab> source-ID <tab> destination-ID <tab> relationship <tab> source-name <tab> destination-name
. This means that the graph is returned as a list of edges, where each edge is represented as a pair of vertices in the formsource <tab> destination
. For each couple of nodes, the subdomain (if any), the relationships for which is safe group annotations (i.e.is_a
andpart_of
) and the names of source/destination obo terms-ID are returned as well. The graph is stored as an anonymous scalar. - build_subonto - extract edges of a specified sub-ontology domain.
-
my $subonto= build_subonto(edgesfile, namespace);
edgesfile: a graph in the form:
subdomain <tab> source <tab> destination <tab> relationship <tab> source-name <tab> destination-name
. This file can be obtained by calling the subroutinebuild_edges
. NB: to run this subroutine, the fieldsrelationship
,source-name
anddestination-name
are optionals. Instead, the fieldsubdomain
is required and must be placed at the first column, otherwise an error message is returned.namespace: name of the subontology for which the edges must be extracted.
output: the graph is returned as a tuple>:
source <tab> destination <tab> relationship
. In other words the graph is returned as a list of edges, where each edge is represented as a pair of vertices in the formsource <tab> destination
. For each couple of nodes the relationshipsis_a
andpart_of
are also returned. The graph is stored as an anonymous scalar. - make_stat - make basic statistic on graph.
-
my $stat= make_stat(edgesfile, parentIndex, childIndex);
edgesfile: a graph represented as a list of edges, where each edge is stored as a pair of vertices <tab> separated. This file can be obtained by calling the subroutine
build_edges
.parentIndex: index referring to the column containing the parent (source) vertices in edgesfile file.
childIndex: index referring to the column containing the child vertices (destination) in the edgesfile file.
output: statistics about the graph are printed on the shell. More precisely, for each vertex of the graph degree, in-degree and out-degree are printed. The vertex are sorted in a decreasing order on the basis of degree, from the higher degree to the smaller degree. Finally, the following statistics are returned as well: 1) number of nodes and edges of the graph; 2) maximum and minimum degree; 3) average and median degree; 4) density of the graph.
- get_parents_or_children_list - build parents or children list for each node of the graph.
-
my $parORchdlist= get_parents_or_children_list(edgesfile, parentIndex, childIndex, parORchd);
edgesfile: a graph represented as a list of edges, where each edge is stored as a pair of vertices <tab> separated. This file can be obtained by calling the subroutine
build_edges
.parentIndex: index referring to the column containing the parent (source) vertices in edgesfile file.
childIndex: index referring to the column containing the child vertices (destination) in the edgesfile file.
parORchd: must be
parents
orchildren
. If$parORchd=parents
a pipe separated list containing the parents of each node of the graph is returned; if$parORchd=children
a pipe separated list containing the children of each node is returned.output: an anonymous hash storing for each node of the graph the list of its children or parents according to the parORchd parameter.
- obo_filter - prune obo file
-
$newobo= obo_filter(obofile, termsfile);
obofile: any obo file listed in OBO foundry. The file extension must be ".obo".
termsfile: file containing the set of obo terms (new line separated) for which obo file must be shortened
output: an anonymous scalar storing the terms listed in the file termsfile according to the obo structure
- gene2biofun - make annotations adjacency list.
-
my ($res, $stat)= gene2biofun(annfile, geneIndex, classIndex);
annfile: an annotations file. The file extension can be either plain format (".txt") or compressed (".gz"). An example of the format of this file can be taken from GOA website (file with ".gaf.gz" extension) or HPO website. More in general any file structured as those aforementioned can be used (basically a ".csv" file using <tab> as separator).
geneIndex: index referring to the column containing the samples (genes/proteins).
classIndex: index referring to the column containing the ontology terms.
output: a list of two anonymous references. The first is an anonymous hash storing for each gene (or protein) all the associated ontology terms (pipe separated). The second is an anonymous scalar containing basic statistics, such as the total unique number of genes/proteins and annotated ontology terms.
- map_OBOterm_between_release - map ontology terms between different releases.
-
my ($res, $stat)= map_OBOterm_between_release(obofile, annfile, classIndex);
obofile: an obo file (a new release). This file is used to make the
alt_id - id
pairing, by usingalt_id
as key. The file extension must be ".obo".annfile: an annotation file (an old release). The file extension can be either plain format (".txt") or compressed (".gz").
classIndex: index referring to the column of the annfile containing the ontology terms to be mapped.
output: a list of two anonymous references. The first is an anonymous scalar storing the annotations file in the same format of the input file but with the obsolete ontology terms replaced with the updated ones. The second reference is an anonymous scalar containing some basic statistics, such as the total unique number of ontology terms and the total number of mapped and not mapped altID ontology terms. Finally, all the found pairs
alt_id - id
are returned (if any).
BUGS
Please report any bugs here.
COPYRIGHT
Copyright (C) 2019 Marco Notaro, all rights reserved.
This program is free software; you can redistribute it and/or modify it under the same terms as Perl 5 programming language system itself.
This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.
AUTHOR
Marco Notaro (https://marconotaro.github.io)
SEE ALSO
A step-by-step tutorial showing how to apply obogaf::parser to real biomedical case studies is available at the following link https://obogaf-parser.readthedocs.io.