NAME

Bio::ToolBox::db_helper::useq

DESCRIPTION

This module provides support for USeq files to the Bio::ToolBox package. Useq files are zip archives representing either intervals or scores. They may be used similarly to either bigWig or bigBed files. More information about useq files may be found at http://useq.sourceforge.net/useqArchiveFormat.html. USeq files use the extension .useq.

USAGE

The module requires Bio::DB::USeq to be installed.

In general, this module should not be used directly. Use the methods available in Bio::ToolBox::db_helper or Bio::ToolBox::Data.

All subroutines are exported by default.

open_useq_db

This subroutine will open a useq database connection. Pass the local path to a useq file (.useq extension). It will return the opened Bio::DB::USeq database object.

collect_useq_scores

This subroutine will collect only the data values from a binary useq file for the specified database region. The positional information of the scores is not retained, and the values are best further processed through some statistical method (mean, median, etc.).

The subroutine is passed a parameter array reference. See "Data Collection Parameters Reference" below for details.

The subroutine returns an array or array reference of the requested dataset values found within the region of interest.

collect_useq_position_scores

This subroutine will collect the score values from a binary useq file for the specified database region keyed by position.

The subroutine is passed a parameter array reference. See "Data Collection Parameters Reference" below for details.

The subroutine returns a hash or hash reference of the defined dataset values found within the region of interest keyed by position. The feature midpoint is used as the key position. When multiple features are found at the same position, a simple mean (for score methods) or sum (for count methods) is returned.

Data Collection Parameters Reference

The data collection subroutines are passed an array reference of parameters. The recommended method for data collection is to use the "get_segment_score" in Bio::ToolBox::db_helper method.

The parameters array reference includes these items:

1. chromosome
2. start coordinate
3. stop coordinate

Coordinates are in BioPerl-style 1-base system.

4. strand

Should be standard BioPerl representation: -1, 0, or 1.

5. strandedness

A scalar value representing the desired strandedness of the data to be collected. Acceptable values include "sense", "antisense", or "all". Only those scores which match the indicated strandedness are collected.

6. score method

Acceptable values include score, count, ncount, and pcount.

* score returns the basepair coverage of alignments over the 
region of interest

* count returns the number of alignments that overlap the 
search region. 

* pcount, or precise count, returns the count of alignments 
whose start and end fall within the region. 

* ncount, or named count, returns an array of alignment read  
names. Use this to avoid double-counting paired-end reads by 
counting only unique names. Reads are taken if they overlap 
the search region.
7. A database object.

Not used here.

8. Path to USeq files

Additional USeq files may be appended to the list when merging. Opened USeq file objects are cached.

SEE ALSO

Bio::ToolBox::Data::Feature, Bio::ToolBox::db_helper, Bio::DB::USeq

AUTHOR

Timothy J. Parnell, PhD
Dept of Oncological Sciences
Huntsman Cancer Institute
University of Utah
Salt Lake City, UT, 84112

This package is free software; you can redistribute it and/or modify it under the terms of the Artistic License 2.0.