NAME

Bio::ViennaNGS::Bam - High-level access to BAM files

SYNOPSIS

use Bio::ViennaNGS::Bam;

# split a single-end  or paired-end BAM file by strands
@result = split_bam($bam_in,$rev,$want_uniq,$want_bed,$destdir,$logfile);

# extract unique and multi mappers from a BAM file
@result = uniquify_bam($bam_in,$outdir,$logfile);

DESCRIPTION

Bio::ViennaNGS::BAM provides high-level access to BAM file. Building on Bio::DB::Sam, it provides code to extract specific portions from BAM files. It comes with routines for splitting BAM files by strand (which is often required for visualization of NGS data) and extracting uniquely and multiply aligned reads from BAM files.

ROUTINES

split_bam($bam,$reverse,$want_uniq,$want_bed,$dest_dir,$log)

Splits BAM file $bam according to [+] and [-] strand. $reverse, $want_uniq and $want_bed are switches with values of 0 or 1, triggering forced reversion of strand mapping (due to RNA-seq protocol constraints), filtering of unique mappers (identified via NH:i:1 SAM argument), and forced output of a BED file corresponding to strand-specific mapping, respectively. $log holds name and path of the log file.

Strand-splitting is done in a way that in paired-end alignments, FIRST and SECOND mates (reads) are treated as _one_ fragment, ie FIRST_MATE reads determine the strand, while SECOND_MATE reads are assigned the opposite strand per definitionem. This also holds if the reads are not mapped in proper pairs and even if there is no mapping partner at all.

Sometimes the library preparation protocol causes inversion of the read assignment (with respect to the underlying annotation). In those cases, the natural mapping of the reads can be obtained by the $reverse flag.

This routine returns an array whose fist two elements are the file names of the newly generate BAM files with reads mapped to the positive, and negative strand, respectively. Elements three and four are the number of fragments mapped to the positive and negative strand. If the $want_bed option was given elements fiveand six are the file names of the output BED files for positive and negative strand, respectively.

NOTE: Filtering of unique mappers is only safe for single-end experiments; In paired-end experiments, read and mate are treated separately, thus allowing for scenarios where eg. one read is a multi-mapper, whereas its associate mate is a unique mapper, resulting in an ambiguous alignment of the entire fragment.

uniquify_bam($bam,$dest,$log)

Extract uniquely and multiply aligned reads from BAM file $bam by means of the NH:i: SAM attribute. New BAM files for unique and multi mappers are created in the output folder $dest, which are named basename.uniq.bam and basename.mult.bam, respectively. If defined, a logfile named $log is created in the output folder.

This routine returns an array holding file names of the newly created BAM files for unique and multi mappers, respectively.

NOTE: Not all short read mappers use the NH:i: SAM attribute to decorate unique and multi mappers. As such, this routine will not work unless your BAM file has these attributes.

DEPENDENCIES

Bio::Perl >= 1.00690001
BIO::DB::Sam >= 1.39
File::Basename
File::Temp
Path::Class
Carp

AUTHORS

Michael T. Wolfinger <michael@wolfinger.eu>

COPYRIGHT AND LICENSE

Copyright (C) 2014 Michael T. Wolfinger <michael@wolfinger.eu>

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.12.4 or, at your option, any later version of Perl 5 you may have available.

This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

1 POD Error

The following errors were encountered while parsing the POD:

Around line 420:

You forgot a '=back' before '=head1'