NAME
Bio::DB::Das::Chado - DAS-style access to a chado database
SYNOPSIS
# Open up a feature database
$db = Bio::DB::Das::Chado->new(
-dsn => 'dbi:Pg:dbname=gadfly;host=lajolla'
-user => 'jimbo',
-pass => 'supersecret',
);
@segments = $db->segment(-name => '2L',
-start => 1,
-end => 1000000);
# segments are Bio::Das::SegmentI - compliant objects
# fetch a list of features
@features = $db->features(-type=>['type1','type2','type3']);
# invoke a callback over features
$db->features(-type=>['type1','type2','type3'],
-callback => sub { ... }
);
# get all feature types
@types = $db->types;
# count types
%types = $db->types(-enumerate=>1);
@feature = $db->get_feature_by_name($class=>$name);
@feature = $db->get_feature_by_target($target_name);
@feature = $db->get_feature_by_attribute($att1=>$value1,$att2=>$value2);
$feature = $db->get_feature_by_id($id);
$error = $db->error;
DESCRIPTION
Chado is the GMOD database schema, and chado is a specific instance of it. It is still somewhat of a moving target, so this package will probably require several updates over the coming months to keep it working.
FEEDBACK
Mailing Lists
User feedback is an integral part of the evolution of this and other GMOD modules. Send your comments and suggestions preferably to one of the GMOD mailing lists. Your participation is much appreciated.
gmod-gbrowse@lists.sourceforge.com
Reporting Bugs
Report bugs to the GMOD bug tracking system at SourceForge to help us keep track the bugs and their resolution.
http://sourceforge.net/tracker/?group_id=27707&atid=391291
AUTHOR - Scott Cain
Email scain@cpan.org
LICENSE
This software may be redistributed under the same license as perl.
APPENDIX
The rest of the documentation details each of the object methods. Internal methods are usually preceded with a _
new
Title : new
Usage : $db = Bio::DB::Das::Chado(
-dsn => 'dbi:Pg:dbname=gadfly;host=lajolla'
-user => 'jimbo',
-pass => 'supersecret',
);
Function: Open up a Bio::DB::DasI interface to a Chado database
Returns : a new Bio::DB::Das::Chado object
Args :
- -dsn [dsn string]
-
A full dbi dsn string for the database, optionally including host and port information, like "dbi:Pg:dbname=chado;host=localhost;port=5432".
- -user [username]
-
The database user name.
- -pass [password]
-
The users password for the database.
- -organism [common_name|abbreviation|"Genus species"]
-
Used to specify the organism that the features should be drawn from in Chado instances that have more than one organism. The argument can be the common name, the abbreviation or "Genus species". Since common name and abbreviation are not guaranteed to be unique, if one of those is supplied and it corresponds to more than one organism_id, the Chado adaptor will die. Since the combination is guaranteed to be unique by table constraints, supplying "Genus species" should always work.
- -srcfeatureslice [1|0] default: 0
-
Setting this to 1 will enable searching for features using a function and a corresponding index that can significantly speed searches, as long as the featureloc_slice function is present in the Chado instance (all "modern" instances of Chado do have this function). Since it available in nearly all Chado instances, in a future release of this adaptor, the default value of -srcfeatureslice will be set to 1 (on).
- -inferCDS [1|0] default: 0
-
Given mRNA features that have exons and polypeptide features as children, when inferCDS is set, the Chado adaptor will calculate the intersection of the exons and polypeptide features and create CDS features that result. This is generally needed when using gene and mRNA features with glyphs in GBrowse that show subparts, like the gene and processed_transcript glyphs. Since this is almost always required, in a future release of this adaptor, the default will be switched to 1 (on).
- -fulltext [1|0] default: 0
-
This item allows full text searching of various Chado text fields, including feature.name, feature.uniquename, synonym.synonym_sgml, dbxref.accession, and all_feature_names.name (which fequently includes featureprop.value, depending on how all_feature_names is configured). Note that to use -fulltext, you must run the preparation script, gmod_chado_fts_prep.pl, on the database, and in addition, it might be a good idea to set up a cronjob to keep the all_feature_names materialized view up to date with the materialized view tool, gmod_materialized_view_tool.pl.
- -recursivMapping [1|0] default: 0
-
In the case where features are mapped to a "small" srcfeature (like a contig) and then that small feature is mapped to a larger feature (like a chromosome), setting -recursivMapping will allow the Chado adaptor to calculate the coordinates of the feature on the larger feature even though it isn't explicitly mapped to it. The Chado adaptor suffers an approximately 20% performance penalty to do this mapping.
- -allow_obsolete [1|0] default: 0
-
If set to 1, allow_obsolete will tell the Chado adaptor to ignore the feature.is_obsolete column when querying to find features.
- -enable_seqscan [1|0] default: 1
-
If set to zero, the -enable_seqscan will send a query planner hint to the PostgreSQL server to make it more costly to do sequential scans on a table. This is generally not necessary, as the query planner in Pg 8+ is smarter than it used to be.
- -do2Level [1|0] default: 0
-
do2Level is a flag for specifying that two "levels" at most of features should be fetch when getting child features. This flag is generally unnecessary as Bio::Graphics::Glyph supports specifying on a per glyph basis what should be fetch. Use of this flag is incompatible with the -recursivMapping flag.
- -reference_class [SO type name]
-
Used to specify what the "base type" is. Typically, this would be chromosome or contig, but setting it is only necessary in the case where features are mapped to more than one srcfeature and you don't want to use the one that is lowest on the graph. For example, you have polypeptides that are mapped to chromosomes and motifs that are mapped to polypeptides. If you want to display the motifs on the polypeptide, you need to set "polypeptide" as the argument for -reference_class.
feature_summary
- Usage
-
$obj->feature_summary()
- Function
-
This function is based on Bio::DB::SeqFeature::Store->feature_summary. The text that follows comes from it's documtation:
This method is used to get coverage density information across a region of interest. You provide it with a region of interest, optional a list of feature types, and a count of the number of bins over which you want to calculate the coverage density. An object is returned corresponding to the requested region. It contains a tag called "coverage" that will return an array ref of "bins" length. Each element of the array describes the number of features that overlap the bin at this postion.
Note that this method uses an approximate algorithm that is only accurate to 500 bp, so when dealing with bins that are smaller than 1000 bp, you may see some shifting of counts between adjacent bins.
Although an -iterator option is provided, the method only ever returns a single feature, so this is fairly useless.
- Returns
-
A single feature containing summary data, or an interator containing that one feature.
- Arguments
-
-seq_id Sequence ID for the region -start Start of region -end End of region -type/-types Feature type of interest or array ref of types -bins Number of bins across region. Defaults to 1000. -iterator Return an iterator across the region
coverage_array
- Usage
-
$obj->coverage_array()
- Function
-
Calculates the coverage/density of a particular feature type over a range.
- Returns
-
A reference to the coverage array, or if called in an array context, a two element array with the reference to the coverage array first and the type that it was called with as the second element.
- Arguments
-
seqid start stop type bins
This is based on the method of the same name in Bio::DB::SeqFeature::Store::DBI::mysql
fulltext
- Usage
-
$obj->fulltext() #get existing value $obj->fulltext($newval) #set new value
- Function
-
Flag to govern the use of full text searching queries
- Returns
-
value of fulltext (a scalar)
- Arguments
-
new value of fulltext (to set)
refclass
- Usage
-
$obj->refclass() #get existing value $obj->refclass($newval) #set new value
- Function
- Returns
-
value of the reference class's cvterm_id (a scalar)
- Arguments
-
new value of the reference class's cvterm_id (to set)
use_all_feature_names
Title : use_all_feature_names
Usage : $obj->use_all_feature_names()
Function: set or return flag indicating that all_feature_names view is present
Returns : 1 if all_feature_names present, 0 if not
Args : to return the flag, none; to set, 1
organism_id
Title : organism_id
Usage : $obj->organism_id()
Function: set or return the organism_id
Returns : the value of the id
Args : to return the flag, none; to set, the common name of the organism
If -organism is set when the Chado feature is instantiated, this method queries the database with the common name to cache the organism_id.
inferCDS
Title : inferCDS
Usage : $obj->inferCDS()
Function: set or return the inferCDS flag
Returns : the value of the inferCDS flag
Args : to return the flag, none; to set, 1
Often, chado databases will be populated without CDS features, since they can be inferred from a union of exons and polypeptide features. Setting this flag tells the adaptor to do the inferrence to get those derived CDS features (at some small performance penatly).
allow_obsolete
Title : allow_obsolete
Usage : $obj->allow_obsolete()
Function: set or return the allow_obsolete flag
Returns : the value of the allow_obsolete flag
Args : to return the flag, none; to set, 1
The chado feature table has a flag column called 'is_obsolete'. Normally, these features should be ignored by GBrowse, but the -allow_obsolete method is provided to allow displaying obsolete features.
sofa_id
Title : sofa_id
Usage : $obj->sofa_id()
Function: get or return the ID to use for SO terms
Returns : the cv.cv_id for the SO ontology to use
Args : to return the id, none; to determine the id, 1
recursivMapping
Title : recursivMapping
Usage : $obj->recursivMapping($newval)
Function: Flag for activating the recursive mapping (desactivated by default)
Returns : value of recursivMapping (a scalar)
Args : on set, new value (a scalar or undef, optional)
Goal : When we have a clone mapped on a chromosome, the recursive mapping maps the features of the clone on the chromosome.
srcfeatureslice
Title : srcfeatureslice
Usage : $obj->srcfeatureslice
Function: Flag for activating
Returns : value of srcfeatureslice
Args : on set, new value (a scalar or undef, optional)
Desc : Allows to use a featureslice of type featureloc_slice(srcfeat_id, int, int)
Important : this and recursivMapping are mutually exclusives
do2Level
Title : do2Level
Usage : $obj->do2Level
Function: Flag for activating the fetching of 2levels in segment->features
Returns : value of do2Level
Args : on set, new value (a scalar or undef, optional)
dbh
Title : dbh
Usage : $obj->dbh($newval)
Function:
Returns : value of dbh (a scalar)
Args : on set, new value (a scalar or undef, optional)
term2name
Title : term2name
Usage : $obj->term2name($newval)
Function: When called with a hashref, sets cvterm.cvterm_id to cvterm.name
mapping hashref; when called with an int, returns the name
corresponding to that cvterm_id; called with no arguments, returns
the hashref.
Returns : see above
Args : on set, a hashref; to retrieve a name, an int; to retrieve the
hashref, none.
Note: should be replaced by Bio::GMOD::Util->term2name
name2term
Title : name2term
Usage : $obj->name2term($newval)
Function: When called with a hashref, sets cvterm.name to cvterm.cvterm_id
mapping hashref; when called with a string, returns the cvterm_id
corresponding to that name; called with no arguments, returns
the hashref.
Returns : see above
Args : on set, a hashref; to retrieve a cvterm_id, a string; to retrieve
the hashref, none.
Note: Should be replaced by Bio::GMOD::Util->name2term
segment
Title : segment
Usage : $db->segment(@args);
Function: create a segment object
Returns : segment object(s)
Args : see below
This method generates a Bio::Das::SegmentI object (see Bio::Das::SegmentI). The segment can be used to find overlapping features and the raw sequence.
When making the segment() call, you specify the ID of a sequence landmark (e.g. an accession number, a clone or contig), and a positional range relative to the landmark. If no range is specified, then the entire region spanned by the landmark is used to generate the segment.
Arguments are -option=>value pairs as follows:
-name ID of the landmark sequence.
-class A namespace qualifier. It is not necessary for the
database to honor namespace qualifiers, but if it
does, this is where the qualifier is indicated.
-version Version number of the landmark. It is not necessary for
the database to honor versions, but if it does, this is
where the version is indicated.
-start Start of the segment relative to landmark. Positions
follow standard 1-based sequence rules. If not specified,
defaults to the beginning of the landmark.
-end End of the segment relative to the landmark. If not specified,
defaults to the end of the landmark.
The return value is a list of Bio::Das::SegmentI objects. If the method is called in a scalar context and there are no more than one segments that satisfy the request, then it is allowed to return the segment. Otherwise, the method must throw a "multiple segment exception".
features
Title : features
Usage : $db->features(@args)
Function: get all features, possibly filtered by type
Returns : a list of Bio::SeqFeatureI objects
Args : see below
Status : public
This routine will retrieve features in the database regardless of position. It can be used to return all features, or a subset based on their type
Arguments are -option=>value pairs as follows:
-type List of feature types to return. Argument is an array
of Bio::Das::FeatureTypeI objects or a set of strings
that can be converted into FeatureTypeI objects.
-callback A callback to invoke on each feature. The subroutine
will be passed each Bio::SeqFeatureI object in turn.
-attributes A hash reference containing attributes to match.
The -attributes argument is a hashref containing one or more attributes to match against:
-attributes => { Gene => 'abc-1',
Note => 'confirmed' }
Attribute matching is simple exact string matching, and multiple attributes are ANDed together.
If one provides a callback, it will be invoked on each feature in turn. If the callback returns a false value, iteration will be interrupted. When a callback is provided, the method returns undef.
types
Title : types
Usage : $db->types(@args)
Function: return list of feature types in database
Returns : a list of Bio::Das::FeatureTypeI objects
Args : see below
This routine returns a list of feature types known to the database. It is also possible to find out how many times each feature occurs.
Arguments are -option=>value pairs as follows:
-enumerate if true, count the features
The returned value will be a list of Bio::Das::FeatureTypeI objects (see Bio::Das::FeatureTypeI.
If -enumerate is true, then the function returns a hash (not a hash reference) in which the keys are the stringified versions of Bio::Das::FeatureTypeI and the values are the number of times each feature appears in the database.
NOTE: This currently raises a "not-implemented" exception, as the BioSQL API does not appear to provide this functionality.
get_feature_by_alias, get_features_by_alias
Title : get_features_by_alias
Usage : $db->get_feature_by_alias(@args)
Function: return list of feature whose name or synonyms match
Returns : a list of Bio::Das::Chado::Segment::Feature objects
Args : See below
This method finds features matching the criteria outlined by the supplied arguments. Wildcards (*) are allowed. Valid arguments are:
get_feature_by_name, get_features_by_name
Title : get_features_by_name
Usage : $db->get_features_by_name(@args)
Function: return list of feature whose names match
Returns : a list of Bio::Das::Chado::Segment::Feature objects
Args : See below
This method finds features matching the criteria outlined by the supplied arguments. Wildcards (*) are allowed. Valid arguments are:
_by_alias_by_name
Title : _by_alias_by_name
Usage : $db->_by_alias_by_name(@args)
Function: return list of feature whose names match
Returns : a list of Bio::Das::Chado::Segment::Feature objects
Args : See below
A private method that implements the get_features_by_name and get_features_by_alias methods. It accepts the same args as those methods, plus an addtional on (-operation) which is either 'by_alias' or 'by_name' to indicate what rule it is to use for finding features.
srcfeature2name
returns a srcfeature name given a srcfeature_id
gff_source_db_id
Title : gff_source_db_id
Function: caches the chado db_id from the chado db table
gff_source_dbxref_id
Gets dbxref_id for features that have a gff source associated
dbxref2source
returns the source (string) when given a dbxref_id
source_dbxref_list
Title : source_dbxref_list
Usage : @all_dbxref_ids = $db->source_dbxref_list()
Function: Gets a list of all dbxref_ids that are used for GFF sources
Returns : a comma delimited string that is a list of dbxref_ids
Args : none
Status : public
This method queries the database for all dbxref_ids that are used to store GFF source terms.
search_notes
Title : search_notes
Usage : $db->search_notes($search_term,$max_results)
Function: full-text search on features, ENSEMBL-style
Returns : an array of [$name,$description,$score]
Args : see below
Status : public
This routine performs a full-text search on feature attributes (which attributes depend on implementation) and returns a list of [$name,$description,$score], where $name is the feature ID (accession?), $description is a human-readable description such as a locus line, and $score is the match strength.
** NOT YET ACTIVE: search_notes IS IN TESTING STAGE **
sub search_notes { my $self = shift; my ($search_string,$limit) = @_; my $limit_str; if (defined $limit) { $limit_str = " LIMIT $limit "; } else { $limit_str = ""; }
# so here's the plan: # if there is only 1 word, do 1-3 # 1. search for accessions like $string.'%'--if any are found, quit and return them # 2. search for feature.name like $string.'%'--if found, keep and continue # 3. search somewhere in analysis like $string.'%'--if found, keep and continue # if there is more than one word, don't search accessions # 4. search each word anded together like '%'.$string.'%' --if found, keep and continue # 5. search somewhere in analysis like '%'.$string.'%'
# $self->dbh->trace(1);
my @search_str = split /\s+/, $search_string;
my $qsearch_term = $self->dbh->quote($search_str[0]);
my $like_str = "( (dbx.accession ~* $qsearch_term OR \n"
." f.name ~* $qsearch_term) ";
for (my $i=1;$i<(scalar @search_str);$i++) {
$qsearch_term = $self->dbh->quote($search_str[$i]);
$like_str .= "and \n";
$like_str .= " (dbx.accession ~* $qsearch_term OR \n"
." f.name ~* $qsearch_term) ";
}
$like_str .= ")";
my $sth = $self->dbh->prepare("
select dbx.accession,f.name,0
from feature f, dbxref dbx, feature_dbxref fd
where
f.feature_id = fd.feature_id and
fd.dbxref_id = dbx.dbxref_id and
$like_str
$limit_str
");
$sth->execute or throw ("couldn't execute keyword query");
my @results;
while (my ($acc, $name, $score) = $sth->fetchrow_array) {
$score = sprintf("%.2f",$score);
push @results, [$acc, $name, $score];
}
$sth->finish;
return @results;
}
attributes
Title : attributes
Usage : @attributes = $db->attributes($id,$name)
Function: get the "attributes" on a particular feature
Returns : an array of string
Args : feature ID [, attribute name]
Status : public
This method is intended as a "work-alike" to Bio::DB::GFF's attributes method, which has the following returns:
Called in list context, it returns a list. If called in a scalar context, it returns the first value of the attribute if an attribute name is provided, otherwise it returns a hash reference in which the keys are attribute names and the values are anonymous arrays containing the values.
_segclass
Title : _segclass
Usage : $class = $db->_segclass
Function: returns the perl class that we use for segment() calls
Returns : a string containing the segment class
Args : none
Status : reserved for subclass use
chado_reference_class
Title : chado_reference_class
Usage : $obj->chado_reference_class()
Function: get or return the ID to use for Gbrowse map reference class
using cvtermprop table, value = MAP_REFERENCE_TYPE
Returns : the cvterm.name
Args : to return the id, none; to determine the id, 1
See also: default_class, refclass_feature_id
Optionally test that user/config supplied ref class is indeed a proper
chado feature type.
refclass_feature_id
Title : refclass_feature_id
Usage : $self->refclass_srcfeature_id()
Function: Used to store the feature_id of the reference class feature we are working on (e.g. contig, supercontig)
With this feature we can filter out all the request to be sure we are extracting a feature located on
the reference class feature.
Returns : A scalar
Args : The feature_id on setting
LEFTOVERS FROM BIO::DB::GFF NEEDED FOR DAS
these methods should probably be declared in an interface class that Bio::DB::GFF implements. for instance, the aggregator methods could be described in Bio::SeqFeature::AggregatorI