NAME
Bio::ViennaNGS - A Perl distribution for Next-Generation Sequencing (NGS) data analysis
DESCRIPTION
Bio::ViennaNGS is a distribution of Perl modules and utilities for building efficient Next-Generation Sequencing (NGS) analysis pipelines. It covers various aspects of NGS data analysis, including (but not limited to) conversion of sequence annotation, evaluation of mapped data, expression quantification and visualization.
The main Bio::ViennaNGS module is shipped with a complementary set of (sub)modules:
- Bio::ViennaNGS::AnnoC: A Moose interface for storage and conversion of sequence annotation data.
- Bio::ViennaNGS::Bam: Routines for high-level manipulation of BAM files.
- Bio::ViennaNGS::BamStat: A Moose based class for collecting mapping statistics.
- Bio::ViennaNGS::BamStatSummary: A Moose interface for processing Bio::ViennaNGS::BamStat objects on a set of BAM files.
- Bio::ViennaNGS::Bed: A Moose interface for manipulation of genomic interval data in BED format.
- Bio::ViennaNGS::Expression: An object oriented interface for read-count based gene expression analysis.
- Bio::ViennaNGS::Fasta: Routines for accessing genomic sequences implemented through a Moose interface to Bio::DB::Fasta.
- Bio::ViennaNGS::Feature: A Moose based BED6 wrapper.
- Bio::ViennaNGS::FeatureChain: Yet another Moose class for chaining gene annotation features.
- Bio::ViennaNGS::FeatureLine: An abstract Moose class for combining several Bio::ViennaNGS::FeatureChain objects.
- Bio::ViennaNGS::MinimalFeature: A Moose interface for handling elementary gene annotation.
- Bio::ViennaNGS::SpliceJunc: A collection of routines for alternative splicing analysis.
- Bio::ViennaNGS::Tutorial: A comprehensive tutorial of the Bio::ViennaNGS core routines with real-world NGS data.
- Bio::ViennaNGS::UCSC: Routines for visualization of genomics data with the UCSC genome browser.
- Bio::ViennaNGS::Util: A collection of wrapper routines for commonly used third-party NGS utilities, code for normalization of gene expression values based on read count data and a set of utility functions.
UTILITIES
Bio::ViennaNGS comes with a collection of command line utilities for accomplishing routine tasks often required in NGS data processing. These utilities serve as reference implementation of the routines implemented throughout the modules and can readily be used for atomic tasks in NGS data processing:
- assembly_hub_constructor.pl: The UCSC genome browser offers the possibility to visualize any organism (including organisms that are not included in the standard UCSC browser bundle) through hso called 'Assembly Hubs'. This script constructs Assembly Hubs from genomic sequence and annotation data.
- bam_split.pl: Split (paired-end and single-end) BAM alignment files by strand and compute statistics. Optionally create BED output, as well as normalized bedGraph and bigWig files for coverage visualization in genome browsers (see dependencies on third-patry tools below).
- bam_to_bigWig.pl: Produce bigWig coverage profiles from (aligned) BAM files, explicitly considering strandedness. The most natural use case of this tool is to create strand-aware coverage profiles in bigWig format for genome browser visualization.
- bam_uniq.pl: Extract unique and multi mapping reads from BAM alignment files and create a separate BAM file for both uniqe (.uniq.) and multi (.mult.) mappers.
- bed2bedGraph.pl: Convert BED files to (strand specific) bedGraph files, allowing additional annotation and automatic generation of bedGraph files which can easily be converted to big-type files for easy UCSC visualization.
- extend_bed.pl: Extend genomic features in BED files by a certain number of nucleotides, either on both sides or specifically at the 5' or 3' end, respectively.
- gff2bed.pl: Convert RefSeq GFF3 annotation files to BED12 format. Individual BED12 files are created for each feature type (CDS/tRNA/rRNA/etc.). Tested with RefSeq bacterial GFF3 annotation.
- kmer_analysis.pl: Count k-mers of predefined length in FastQ and Fasta files
- MEME_XML_motif_extractor.pl: Compute simple statistics from MEME XML output and return a list of found motifs with the number of sequences containing those motifs as well as nice ggplot graphs.
- newUCSCdb.pl: Create a new genome database to a locally installed instance of the UCSC genome browser in order to add a novel organism for visualization. Based on this Genomewiki article.
- normalize_multicov.pl: Compute normalized expression data in TPM from (raw) read counts in bedtools multicov format. TPM reference: Wagner et al, Theory Biosci. 131(4), pp 281-85 (2012)
- sj_visualizer.pl: Convert splice junctions from mapped RNA-seq data in segemehl BED6 splice junction format to BED12 for easy visualization in genome Browsers.
- splice_site_summary.pl: Identify and characterize splice junctions from RNA-seq data by intersecting them with annotated splice junctions.
- trim_fastq.pl: Trim sequence and quality string fields in a Fastq file by user defined length.
DEPENDENCIES
The Bio::ViennaNGS modules and classes depend on a set of Perl modules, some of which are part of the Perl core distribution:
- Bio::Perl >= 1.00690001
- Bio::DB::Sam >= 1.37
- Bio::DB::Fasta
- Bio::Tools::GFF
- File::Basename
- File::Temp
- Path::Class
- IPC::Cmd
- Carp
- Template
- Moose
- Moose::Util::TypeConstraints
- namespace::autoclean
- MooseX::Clone
- MooseX::InstanceTracking
- Tie::Hash::Indexed
In addition the following modules are required by the Bio::ViennaNGS utilities:
Bio::ViennaNGS uses third-party tools for computing intersections of BED files: bedtools intersect from the BEDtools suite is used to compute overlaps and bedtools sort is used to sort BED output files. Make sure that those third-party utilities are available on your system, and that hey can be found and executed by the Perl interpreter. We recommend installing the latest version of BEDtools on your system.
SOURCE AVAILABILITY
Source code for this distribution is available from the ViennaNGS Github repository.
PAPERS
If the Bio::ViennaNGS suite is useful for your work or if you use any component of the distribution in a custom pipeline, please cite the following publication:
"ViennaNGS - A toolbox for building efficient next-generation sequencing analysis pipelines"
Michael T. Wolfinger, Joerg Fallmann, Florian Eggenhofer and Fabian Amman
bioRxiv doi:10.1101/013011.
NOTES
The Bio::ViennaNGS suite is actively developed and tested on different flavours of Linux and Mac OS X. We have taken care of writing platform-independent code that should run out of the box on most UNIX-based systems, however we do not have access to machines running Microsoft Windows. As such, we have not tested and will not test Windows compatibility.
SEE ALSO
- Bio::ViennaNGS::AnnoC
- Bio::ViennaNGS::Bam
- Bio::ViennaNGS::BamStat
- Bio::ViennaNGS::BamStatSummary
- Bio::ViennaNGS::Bed
- Bio::ViennaNGS::Expression
- Bio::ViennaNGS::Fasta
- Bio::ViennaNGS::Feature
- Bio::ViennaNGS::FeatureChain
- Bio::ViennaNGS::FeatureLine
- Bio::ViennaNGS::MinimalFeature
- Bio::ViennaNGS::SpliceJunc
- Bio::ViennaNGS::Tutorial
- Bio::ViennaNGS::UCSC
- Bio::ViennaNGS::Util
AUTHORS
- Michael T. Wolfinger <michael@wolfinger.eu>
- Jörg Fallmann <fall@tbi.univie.ac.at>
- Florian Eggenhofer <florian.eggenhofer@tbi.univie.ac.at>
- Fabian Amman <fabian@tbi.univie.ac.at<gt>
COPYRIGHT AND LICENSE
Copyright (C) 2014-2015 Michael T. Wolfinger <michael@wolfinger.eu>
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself, either Perl version 5.10.0 or, at your option, any later version of Perl 5 you may have available.
This software is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 285:
Non-ASCII character seen before =encoding in 'Jörg'. Assuming UTF-8