SYNOPSIS
use Bio::DB::BigSet;
use Bio::DB::BigWig 'binMean';
my $wigset = Bio::DB::BigWigSet->new(-dir => $dir,
-feature_type => 'summary'
);
my $iterator = $wigset->get_seq_stream(-seq_id => 'I',
-start => 100,
-end => 1000,
-type => 'binding_site');
while (my $summary = $iterator->next_seq) {
my $arry = $summary->statistical_summary(100);
print binMean($_),"\n" foreach @$arry;
}
DESCRIPTION
This module provides a convenient way of adding metadata to a directory of BigWig files in such a way that it appears that all the BigWig files form a single database of sequence features. The directory should be layed out so that it contains one or more BigWig files and a metadata index file that adds names and attributes to the files.
The metdata file must be named beginning "meta". Anything following the initial ``meta'' is fine. Its format is described below. The metadata file is optional if the BigWig files end with the extension ".bw", in which case they will be added to the collection automatically
The metadata file is plain text and should be laid out like this:
[file1.bw]
display_name = foobar
type = some_type1
method = my_method1
source = my_source1
some_attribute = value1
another_attribute = value2
[file2.bw]
display_name = barfoo
type = some_type2
method = my_method2
source = my_source2
some_attribute = value3
another_attribute = value4
...
Each stanza begins with the name of one of the bigwig files in the directory, enclosed by brackets. Following this are a series of "attribute = value pairs" which will be applied to features returned from the corresponding BigWig file. The following attributes have predefined meanings:
Attribute Value
--------- -----
display_name The value returned by each feature's display_name()
method.
name An alias for display_name.
type The value returned by each feature's type() method
(this method will return "$method:$source" if
type is not defined).
primary_tag The value returned by each feature's primary_tag()
and method () methods.
method An alias for primary_tag.
source The value returned by each feature's source() and
source_tag() methods.
Any other attributes are stored in the feature and can be retrieved with the get_all_tags(), get_tag_values() and attributes() methods. See Bio::SeqFeatureI and Bio::SeqFeature::Lite.
Any bigwig files that are present in the directory but not mentioned in the metdata file will be assigned a display_name equal to the name of the file, minus the .bw extension.
The point of this is to allow you to make a set of BigWig files act like a uniform database, and to assign distinguishing types, names and attributes to the features returned to the file.
For example, if one of the WigFiles is assigned a type of "polII_binding_site:early_embryo" using a stanza like this:
[random_wigfile.bw]
type = polII_binding_site:early_embryo
You can fetch it from the BigWigSet using this call:
my @summaries = $bigwigset->features(-seq_id=>'chr1',
-start =>1,-end=>50_000_000,
-type => 'polII_binding_site:early_embryo');
See Bio::DB::SeqFeature::Store for more examples of this API.
The directory of BigWigs may be on a remote HTTP or FTP server; simply provide the URL for the remote directory. This will only work if the remote server allows directory listings.
METHODS
Most methods are inherited from Bio::DB::BigWig (see Bio::DB::BigWig). This section describes the differences.
Class Methods
- $bws = Bio::DB::BigWigSet->new('/path/to/directory')
- $bws = Bio::DB::BigWigSet->new(-dir => '/path/to/directory', -feature_type => $type, -fasta => $fasta_file_or_obj)
- $bws = Bio::DB::BigWigSet->new(-index => '/path/to/metadata.txt')
-
This method creates a new Bio::DB::BigWigSet. If just one argument is provided, it is used as the path to the directory where the BigWig files are stored. In the named-argument form, the following arguments are recognized:
Argument Description -------- ----------- -dir Path to the directory containing wigfiles and metadata. -fasta A Fasta file path or a sequence accessor to pass to each of the BigWig files when they are open. See the Bio::DB::BigWig manual page for more information. -feature_type The type of feature to retrieve from the BigWig files. One of "summary", "bin", "region" or "interval." See the Bio::DB::BigWig manual page for more information. If not specified "summary" is assumed. -index Provide a path to the metadata file directly.
You may call new() without any arguments, in which case an empty BigWig set is created. You may add BigWig files to the set individually using add_bigwig().
- $count = Bio::DB::BigWig->index_dir($path/to/dir)
-
Given a directory, this class method creates a skeletal metadata file named "metadata.index" from any bigwig fies it finds in the directory. You should customize this file as needed. If the method is called on a directory that already contains one or more metadata files then it will leave intact any stanzas that correspond to existing SAM tiles, add new stanzas for bigwigs that are not mentioned in the index, and remove stanzas that no longer correspond to a WIG file.
Also see the index_bigwigset.pl script that comes with the distribution.
Accessors
These are accessors for BigWigSet properties.
- $fasta = $bws->fasta_path([$new_path])
-
Get or set the FASTA file path or sequence accessor.
- $type = $bws->feature_type([$new_type])
-
Get or set the underlying type of object that the BigWig set will return. One of "summary", "bin", "region" or "interval". See Bio::DB::BigWig.
- $accessor = $bws->dna_accessor()
-
Returns the object that will be used to access DNA sequences, if a -fasta argument was providd at create time.
Fetching Features
These are methods used to query the collection of BigWig files managed by the BigWigSet for various kinds of features. They are similar to the like-named methods in Bio::DB::BigWig.
- @features = $bigwig->features(@args)
-
This method is the workhorse for retrieving various types of intervals and summary statistics from the BigWig database. It takes a series of named arguments in the format (-argument1 => value1, -argument2 => value2, ...) and returns a list of zero or more BioPerl Bio::SeqFeatureI objects.
The following arguments are recognized:
Argument Description Default -------- ----------- ------- -seq_id Chromosome or contig name defining All chromosomes/contigs. the range of interest. -start Start of the range of interest. 1 -end End of the range of interest Chromosome/contig end -type Retrieve only features with the none matching type(s). The argument can be a scalar or an arrayref. -name Retrieve only features with the none indiccated name. -attributes Retrieve only features that have none matching attributes. The argument is a hashref of tag value attributes, in which the key is the tag and the value is either a simple value, or an array reference of values. -iterator Boolean, which if true, returns undef (false) an iterator across the list rather than the list itself.
The features() method is similar to that of Bio::DB::BigWig, but some of the arguments have slightly different meanings.
-type is a selector that filters the features returned by the type specified in their metadata. You can provide a single value, or an arrayref of several types to filter by. Only BigWig files whose type, as set in the metadata index, match one or more of the provided types will be consulted for features. The behavior of the features -- whether they represent individal wiggle intervals, summaries, or bins is set by the -feature_type argument passed to the BigWigSet->new() method.
The -attributes argument will filter BigWig data by any combination of metadata tag. Here's how it works:
@features = $bws->features(-seq_id => 'chr1', -attributes => {method => ['ChIP-seq','ChIP-chip'], validated => 1});
This will query BigWig files from the set whose metadata indicates a method of either "ChIP-seq" or "ChIP-chip" and which have a "validated" attribute of 1. GLOB matches, such as "ChIP*" are also accepted.
The features returned from this call will return values from display_name(), primary_tag(), source_tag(), and get_tag_values() that correspond to the information specified in the metadata index. For example, the features returned from the example query above, will return either "ChIP-seq" or "ChIP-chip" when you call:
$method = $feature->get_tag_values('method');
- $iterator = $bws->get_seq_stream(@args)
-
This call takes the same arguments as features() but returns a memory-efficient iterator. Call the iterator's next_seq() method repeatedly to fetch the features one at a time.
- @features = $bws->get_features_by_location($seq_id,$start,$end)
-
Same as in Bio::DB::BigWig, except that features from all members of the set are returned.
- @features = $bws->get_features_by_name($name)
- @features = $bws->get_feature_by_name($name)
- @features = $bws->get_features_by_alias($name)
-
Only features from BigWig files whose display_name attribute matches $name will be returned. These three methods all do the same thing.
- @features = $bws->get_features_by_attribute($attributes)
-
Only features matching the attributes hash will be returned. See features() for a description of how this filter works.
- $feature = $bws->get_features_by_id($id)
-
Given an ID returned by calling a feature's primary_id() method this returns the same feature. If used between sessions, it only works as expected if the BigWigSet is created in the same way each time.
Methods for manipulating the BigWig files contained in the set
These are methods that allow you to add BigWig files to the set and manipulate their metadata.
- $bws->readdir($path)
-
Read the contents of the indicated directory, and combine the information about the BigWig .bw files and metadata indexes into a BigWig set. You may call this repeatedly to combine multiple directories into a BigWigSet.
- $bws->add_bigwig($path)
-
Given a path to a .bw file, add the BigWig file to the set.
- $bws->remove_bigwig($path)
-
Given a path to a .bw file, removes it from the set.
- $bws->set_bigwig_attributes($path,$attributes)
-
Given the path to a BigWig file, assign metadata to it. The second argument is a hash in which the keys are attribute names such as "type" and the values are the values of those attributes.
If the BigWig file is not already part of the set, it is added (as in add_bigwig()).
- @paths = $bws->bigwigs
-
Returns the path to all the BigWig files in the collection.
- $bigwig = $bws->get_bigwig($path)
-
If the BigWig file is part of the set, opens and returns it.
Using BigWig objects and GBrowse
The Generic Genome Browser version 2.0 (http://www.gmod.org/gbrowse) can treat a BigWig file as a track database. A typical configuration will look like this:
[BigWig:database]
db_adaptor = Bio::DB::BigWigSet
db_args = -dir /var/www/data/bigwigs
-fasta /var/www/data/elegans-ws190.fa
[BigWigIntervals]
feature = ChIP-chip
database = BigWig
glyph = wiggle_whiskers
min_score = -1
max_score = +1.5
key = ChIP-chip datasets
SEE ALSO
Bio::DB::BigWig Bio::DB::BigFile, Bio::Perl, Bio::Graphics, Bio::Graphics::Browser2
AUTHOR
Lincoln Stein <lincoln.stein@oicr.on.ca>. <lincoln.stein@bmail.com>
Copyright (c) 2010 Ontario Institute for Cancer Research.
This package and its accompanying libraries is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0. Refer to LICENSE for the full license text. In addition, please see DISCLAIMER.txt for disclaimers of warranty.