NAME
similarity_match.pl
SYNOPSIS
Compares a list of annotations to another ontology and suggests the best match based on the EBI::FGPT::FuzzyRecogniser module. It is also possible to align one ontology to another. Accepts ontologies in both OBO and OWL formats as well as MeSH ASCII and OMIM txt.
The script runs non-interactively and the results have to be manually inspected, although it can be expected that anything with a similarity score higher than ~80-90% will be a valid match.
USAGE
similarity_match.pl (-w owlfile || -o obofile || -m meshfile || -i omimfile) -t targetfile -r resultfile [--obotarget || --owltarget]
Optional '--obotarget' setting specifies that the target file is an OBO ontology. Optional '--owltarget' setting specifies that the target file is an OWL ontology.
INPUT FILES
- ontologies to map the targetfile against
-
owlfile, obofile, meshfile, omimfile are ontologies in OWL, OBO, MeSH ASCII and OMIM formats respectively Only a single file needs to be specified.
- targetfile
-
The script expects a tab-delimited text file with headers. Only the first column will be used for matching. All other columns will be preserved in the output.
OUTPUT
The script will produce a single tab-delimited file as set with the -r flag. The file will have four additional headers
- SOURCE_ACCESSION
-
Accession of the source term if target file was an ontology.
- SOURCE_LABEL
-
Label of the source term if target file was an ontology.
- SOURCE_VALUE
-
The annotation (label or synoym if target file was an ontology) that was matched based on the highest similarity against the supplied ontology file
- MATCHED_ACCESSION
-
Accession of the matched term that provided the best match.
- MATCHED_LABEL
-
Matched term's label.
- MATCHED_VALUE
-
The actual term's annotation (label or synoym) that was matched based on the highest similarity from the supplied ontology file.
- MATCH_SIMILARITY%
-
Similarity score of the two matched terms normalised by lenght of the longer of the two strings and expressed in %. Higher is better.
DESCRIPTION
Function list
- align()
-
Aligns the two data structures targetfile and ontology. Outputs the results into a file.
- parseFlat()
-
Custom flat file parser.
- parseFlatColumns()
-
Splits and joins the columns of a flat file. The first column is assigned to the first element. Concatenates the ragged end (leftover columns) into the second element or returns undef for a one-column file.
ACKNOWLEDGMENTS
Emma Hastings <emma@ebi.ac.uk>
AUTHORS
Tomasz Adamusiak <tomasz@cpan.org>