SYNOPSIS

use Bio::DB::BigSet;
use Bio::DB::BigWig 'binMean';

my $wigset = Bio::DB::BigWigSet->new(-dir          => $dir,
                                     -feature_type => 'summary'
 );

my $iterator = $wigset->get_seq_stream(-seq_id => 'I',
                                       -start  => 100,
                                       -end    => 1000,
                                       -type   => 'binding_site');
while (my $summary = $iterator->next_seq) {
   my $arry = $summary->statistical_summary(100);
   print binMean($_),"\n" foreach @$arry;
}

DESCRIPTION

This module provides a convenient way of adding metadata to a directory of BigWig files in such a way that it appears that all the BigWig files form a single database of sequence features. The directory should be layed out so that it contains one or more BigWig files and a metadata index file that adds names and attributes to the files.

The metdata file must be named beginning "meta". Anything following the initial ``meta'' is fine. Its format is described below. The metadata file is optional if the BigWig files end with the extension ".bw", in which case they will be added to the collection automatically

The metadata file is plain text and should be laid out like this:

[file1.bw]
display_name = foobar
type         = some_type1
method       = my_method1
source       = my_source1
some_attribute    = value1
another_attribute = value2

[file2.bw]
display_name = barfoo
type         = some_type2
method       = my_method2
source       = my_source2
some_attribute    = value3
another_attribute = value4

...

Each stanza begins with the name of one of the bigwig files in the directory, enclosed by brackets. Following this are a series of "attribute = value pairs" which will be applied to features returned from the corresponding BigWig file. The following attributes have predefined meanings:

Attribute        Value
---------        -----

display_name     The value returned by each feature's display_name()
                  method.

name             An alias for display_name.

type             The value returned by each feature's type() method
                   (this method will return "$method:$source" if
                   type is not defined).

primary_tag      The value returned by each feature's primary_tag()
                  and method () methods.

method           An alias for primary_tag.

source           The value returned by each feature's source() and
                   source_tag() methods.

Any other attributes are stored in the feature and can be retrieved with the get_all_tags(), get_tag_values() and attributes() methods. See Bio::SeqFeatureI and Bio::SeqFeature::Lite.

Any bigwig files that are present in the directory but not mentioned in the metdata file will be assigned a display_name equal to the name of the file, minus the .bw extension.

The point of this is to allow you to make a set of BigWig files act like a uniform database, and to assign distinguishing types, names and attributes to the features returned to the file.

For example, if one of the WigFiles is assigned a type of "polII_binding_site:early_embryo" using a stanza like this:

[random_wigfile.bw]
type = polII_binding_site:early_embryo

You can fetch it from the BigWigSet using this call:

my @summaries = $bigwigset->features(-seq_id=>'chr1',
                                     -start =>1,-end=>50_000_000,
                                     -type  => 'polII_binding_site:early_embryo');

See Bio::DB::SeqFeature::Store for more examples of this API.

The directory of BigWigs may be on a remote HTTP or FTP server; simply provide the URL for the remote directory. This will only work if the remote server allows directory listings.

METHODS

Most methods are inherited from Bio::DB::BigWig (see Bio::DB::BigWig). This section describes the differences.

Class Methods

$bws = Bio::DB::BigWigSet->new('/path/to/directory')
$bws = Bio::DB::BigWigSet->new(-dir => '/path/to/directory', -feature_type => $type, -fasta => $fasta_file_or_obj)
$bws = Bio::DB::BigWigSet->new(-index => '/path/to/metadata.txt')

This method creates a new Bio::DB::BigWigSet. If just one argument is provided, it is used as the path to the directory where the BigWig files are stored. In the named-argument form, the following arguments are recognized:

  Argument          Description
  --------          -----------

  -dir              Path to the directory containing wigfiles and
                     metadata.

  -fasta            A Fasta file path or a sequence accessor to
                     pass to each of the BigWig files when they are
                     open. See the Bio::DB::BigWig manual page for
                     more information.

  -feature_type     The type of feature to retrieve from the BigWig
                     files. One of "summary", "bin", "region" or
		     "interval." See the Bio::DB::BigWig manual
                     page for more information. If not specified
                     "summary" is assumed.

  -index            Provide a path to the metadata file directly.

You may call new() without any arguments, in which case an empty BigWig set is created. You may add BigWig files to the set individually using add_bigwig().

$count = Bio::DB::BigWig->index_dir($path/to/dir)

Given a directory, this class method creates a skeletal metadata file named "metadata.index" from any bigwig fies it finds in the directory. You should customize this file as needed. If the method is called on a directory that already contains one or more metadata files then it will leave intact any stanzas that correspond to existing SAM tiles, add new stanzas for bigwigs that are not mentioned in the index, and remove stanzas that no longer correspond to a WIG file.

Also see the index_bigwigset.pl script that comes with the distribution.

Accessors

These are accessors for BigWigSet properties.

$fasta = $bws->fasta_path([$new_path])

Get or set the FASTA file path or sequence accessor.

$type = $bws->feature_type([$new_type])

Get or set the underlying type of object that the BigWig set will return. One of "summary", "bin", "region" or "interval". See Bio::DB::BigWig.

$accessor = $bws->dna_accessor()

Returns the object that will be used to access DNA sequences, if a -fasta argument was providd at create time.

Fetching Features

These are methods used to query the collection of BigWig files managed by the BigWigSet for various kinds of features. They are similar to the like-named methods in Bio::DB::BigWig.

@features = $bigwig->features(@args)

This method is the workhorse for retrieving various types of intervals and summary statistics from the BigWig database. It takes a series of named arguments in the format (-argument1 => value1, -argument2 => value2, ...) and returns a list of zero or more BioPerl Bio::SeqFeatureI objects.

The following arguments are recognized:

   Argument     Description                         Default
   --------     -----------                         -------

   -seq_id      Chromosome or contig name defining  All chromosomes/contigs.
                the range of interest.

   -start       Start of the range of interest.     1

   -end         End of the range of interest        Chromosome/contig end

   -type        Retrieve only features with the     none
                  matching type(s). The argument
                  can be a scalar or an arrayref. 

   -name        Retrieve only features with the     none
                  indiccated name.

   -attributes  Retrieve only features that have    none
                  matching attributes. The argument
                  is a hashref of tag value
		  attributes, in which the key is
		  the tag and the value is either
		  a simple value, or an array 
		  reference of values.

   -iterator    Boolean, which if true, returns     undef (false)
                an iterator across the list rather
                than the list itself.

The features() method is similar to that of Bio::DB::BigWig, but some of the arguments have slightly different meanings.

-type is a selector that filters the features returned by the type specified in their metadata. You can provide a single value, or an arrayref of several types to filter by. Only BigWig files whose type, as set in the metadata index, match one or more of the provided types will be consulted for features. The behavior of the features -- whether they represent individal wiggle intervals, summaries, or bins is set by the -feature_type argument passed to the BigWigSet->new() method.

The -attributes argument will filter BigWig data by any combination of metadata tag. Here's how it works:

@features = $bws->features(-seq_id      => 'chr1',
                           -attributes  => {method    => ['ChIP-seq','ChIP-chip'],
                                            validated => 1});

This will query BigWig files from the set whose metadata indicates a method of either "ChIP-seq" or "ChIP-chip" and which have a "validated" attribute of 1. GLOB matches, such as "ChIP*" are also accepted.

The features returned from this call will return values from display_name(), primary_tag(), source_tag(), and get_tag_values() that correspond to the information specified in the metadata index. For example, the features returned from the example query above, will return either "ChIP-seq" or "ChIP-chip" when you call:

$method = $feature->get_tag_values('method');
$iterator = $bws->get_seq_stream(@args)

This call takes the same arguments as features() but returns a memory-efficient iterator. Call the iterator's next_seq() method repeatedly to fetch the features one at a time.

@features = $bws->get_features_by_location($seq_id,$start,$end)

Same as in Bio::DB::BigWig, except that features from all members of the set are returned.

@features = $bws->get_features_by_name($name)
@features = $bws->get_feature_by_name($name)
@features = $bws->get_features_by_alias($name)

Only features from BigWig files whose display_name attribute matches $name will be returned. These three methods all do the same thing.

@features = $bws->get_features_by_attribute($attributes)

Only features matching the attributes hash will be returned. See features() for a description of how this filter works.

$feature = $bws->get_features_by_id($id)

Given an ID returned by calling a feature's primary_id() method this returns the same feature. If used between sessions, it only works as expected if the BigWigSet is created in the same way each time.

Methods for manipulating the BigWig files contained in the set

These are methods that allow you to add BigWig files to the set and manipulate their metadata.

$bws->readdir($path)

Read the contents of the indicated directory, and combine the information about the BigWig .bw files and metadata indexes into a BigWig set. You may call this repeatedly to combine multiple directories into a BigWigSet.

$bws->add_bigwig($path)

Given a path to a .bw file, add the BigWig file to the set.

$bws->remove_bigwig($path)

Given a path to a .bw file, removes it from the set.

$bws->set_bigwig_attributes($path,$attributes)

Given the path to a BigWig file, assign metadata to it. The second argument is a hash in which the keys are attribute names such as "type" and the values are the values of those attributes.

If the BigWig file is not already part of the set, it is added (as in add_bigwig()).

@paths = $bws->bigwigs

Returns the path to all the BigWig files in the collection.

$bigwig = $bws->get_bigwig($path)

If the BigWig file is part of the set, opens and returns it.

Using BigWig objects and GBrowse

The Generic Genome Browser version 2.0 (http://www.gmod.org/gbrowse) can treat a BigWig file as a track database. A typical configuration will look like this:

 [BigWig:database]
 db_adaptor    = Bio::DB::BigWigSet
 db_args       = -dir /var/www/data/bigwigs
	         -fasta  /var/www/data/elegans-ws190.fa

 [BigWigIntervals]
 feature  = ChIP-chip
 database = BigWig
 glyph    = wiggle_whiskers
 min_score = -1
 max_score = +1.5
 key       = ChIP-chip datasets

SEE ALSO

Bio::DB::BigWig Bio::DB::BigFile, Bio::Perl, Bio::Graphics, Bio::Graphics::Browser2

AUTHOR

Lincoln Stein <lincoln.stein@oicr.on.ca>. <lincoln.stein@bmail.com>

Copyright (c) 2010 Ontario Institute for Cancer Research.

This package and its accompanying libraries is free software; you can redistribute it and/or modify it under the terms of the GPL (either version 1, or at your option, any later version) or the Artistic License 2.0. Refer to LICENSE for the full license text. In addition, please see DISCLAIMER.txt for disclaimers of warranty.