NAME
Bio::ViennaNGS::Bam - High-level access to BAM files
SYNOPSIS
use Bio::ViennaNGS::Bam;
# split a single-end or paired-end BAM file by strands
@result = split_bam($bam_in,$rev,$want_uniq,$want_bed,$destdir,$logfile);
# extract unique and multi mappers from a BAM file
@result = uniquify_bam($bam_in,$outdir,$logfile);
DESCRIPTION
Bio::ViennaNGS::BAM provides high-level access to BAM file. Building on Bio::DB::Sam, it provides code to extract specific portions from BAM files. It comes with routines for splitting BAM files by strand (which is often required for visualization of NGS data) and extracting uniquely and multiply aligned reads from BAM files.
ROUTINES
- split_bam($bam,$reverse,$want_uniq,$want_bed,$dest_dir,$log)
-
Splits BAM file $bam according to [+] and [-] strand.
$reverse
,$want_uniq
and$want_bed
are switches with values of 0 or 1, triggering forced reversion of strand mapping (due to RNA-seq protocol constraints), filtering of unique mappers (identified via NH:i:1 SAM argument), and forced output of a BED file corresponding to strand-specific mapping, respectively.$log
holds name and path of the log file.Strand-splitting is done in a way that in paired-end alignments, FIRST and SECOND mates (reads) are treated as _one_ fragment, ie FIRST_MATE reads determine the strand, while SECOND_MATE reads are assigned the opposite strand per definitionem. This also holds if the reads are not mapped in proper pairs and even if there is no mapping partner at all.
Sometimes the library preparation protocol causes inversion of the read assignment (with respect to the underlying annotation). In those cases, the natural mapping of the reads can be obtained by the
$reverse
flag.This routine returns an array whose fist two elements are the file names of the newly generate BAM files with reads mapped to the positive, and negative strand, respectively. Elements three and four are the number of fragments mapped to the positive and negative strand. If the
$want_bed
option was given elements five and six are the file names of the output BED files for positive and negative strand, respectively.NOTE: Filtering of unique mappers is only safe for single-end experiments; In paired-end experiments, read and mate are treated separately, thus allowing for scenarios where eg. one read is a multi-mapper, whereas its associate mate is a unique mapper, resulting in an ambiguous alignment of the entire fragment.
As mentioned above, the NH:i: SAM attribute is used for discriminating unique and multi mappers, thus requiring this attribute to be present in every SAM record. If this attribute is not found in all SAM entries, a warning will be issued and the log file will contain a note indicating that there were issues with the NH attribute.
- uniquify_bam($bam,$dest,$log)
-
Extract uniquely and multiply aligned reads from BAM file
$bam
by means of the NH:i: SAM attribute. New BAM files for unique and multi mappers are created in the output folder$dest
, which are named basename.uniq.bam and basename.mult.bam, respectively. If defined, a logfile named$log
is created in the output folder.This routine returns an array holding file names of the newly created BAM files for unique and multi mappers, respectively.
NOTE: Not all short read mappers use the NH:i: SAM attribute to decorate unique and multi mappers. As such, this routine will not work unless your BAM file has these attributes.
- uniquify_bam2($bam,$dest,$log)
-
Extract uniquely and multiply aligned reads from BAM file
$bam
by means of the NH:i: SAM attribute, like the originaluniquify_bam
routine. Contrary to that, this one expects a name-sorted BAM file and reads in bands of (supposedly paired-end) reads sharing the same id/query name. If all reads in a band are unique mappers, they go to the basename.uniq.band.bam file, else all reads go the basename.mult.band.bam file.This routine returns an array holding file names of the newly created BAM files for unique and multi mappers, respectively.
NOTE: Not all short read mappers use the NH:i: SAM attribute to decorate unique and multi mappers. As such, this routine will not work unless your BAM file has these attributes.
DEPENDENCIES
AUTHORS
COPYRIGHT AND LICENSE
Copyright (C) 2013-2017 Michael T. Wolfinger <michael@wolfinger.eu>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.