NAME
BioX::Wrapper::Annovar - A wrapper around the annovar annotation pipeline
VERSION
Version 0.06
SYNOPSIS
annovar-wrapper.pl --vcfs file1.vcf,file2.vcf --annovardb_path /path/to/annovar/dbs
This module is a wrapper around the popular annotation tool, annovar. http://www.openbioinformatics.org/annovar/ . The commands generated are taken straight from the documentation. In addition, there is an option to reannotate using vcf-annotate from vcftools.
It takes as its input a list or directory of vcf files, bgzipped and tabixed or not, and uses annovar to create annotation files. These multianno table files can be optionally reannotated into the vcf file. This script does not actually execute any commands, only writes them to STDOUT for the user to run as they wish.
It comes with an executable script annovar-wrapper.pl. This should be sufficient for most of your needs, but if you wish to overwrite methods you can always do so in the usual Moose fashion.
#!/usr/bin/env perl
package Main;
use Moose;
extends 'BioX::Wrapper::Annovar';
BioX::Wrapper::Annovar->new_with_options->run;
sub method_to_override {
my $self = shift;
#dostuff
};
before 'method' => sub {
my $self = shift;
#dostuff
};
has '+variable' => (
#things to add to variable declaration
);
#or
has 'variable' => (
#override variable declaration
);
1;
Please see the Moose::Manual::MethodModifiers for more information.
Prerequisites
This module requires the annovar download. The easiest thing to do is to put the annovar scripts in your ENV{PATH}, but if you choose not to do this you can also pass in the location with
annovar-wrapper.pl --tableannovar_path /path/to/table_annovar.pl --convert2annovar_path /path/to/convert2annovar.pl
It requires Vcf.pm, which comes with vcftools.
Vcftools is publicly available for download. http://vcftools.sourceforge.net/.
export PERL5LIB=$PERL5LIB:path_to_vcftools/perl
If you wish to you reannotate the vcf file you need to have bgzip and tabix installed, and have the executables in vcftools in your path.
export PATH=$PATH:path_to_vcftools
Generate an Example
To generate an example you can run the following commands
tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz 2:39967768-40000000 > test.vcf
bgzip test.vcf
tabix test.vcf.gz
vcf-subset -c HG00098,HG00100,HG00106,HG00112,HG00114 test.vcf.gz | bgzip -c > out.vcf.gz
tabix out.vcf.gz
rm test.vcf.gz
rm test.vcf.gz.tbi
annovar-wrapper.pl --vcfs out.vcf.gz --annovar_dbs refGene --annovar_fun g --outdir annovar_out --annovardb_path /path/to/annovar/dbs > my_cmds.sh
There is more detail on the example in the pod files.
Variables
Annovar Options
tableannovar_path
You can put the location of the annovar scripts in your ENV{PATH}, and the default is fine. If annovar is not in your PATH, please supply the location.
convert2annovar_path
You can put the location of the annovar scripts in your ENV{PATH}, and the default is fine. If annovar is not in your PATH, please supply the location
annovardb_path
Path to your annovar databases
buildver
Its probably hg19 or hg18
convert2annovar_opts
Assumes vcf version 4 and that you want to convert all samples
Not using --allsample on a multisample vcf is untested and will probably break the whole pipeline
annovar_dbs
These are pretty much all the databases listed on
http://www.openbioinformatics.org/annovar/annovar_download.html for hg19 that I tested as working
#Download databases with
cd path_to_annovar_dir
./annotate_variation.pl --buildver hg19 -downdb -webfrom annovar esp6500si_aa hg19/
#Option is an ArrayRef, and can be given as either
--annovar_dbs cg46,cg69,nci60
#or
--annovar_dbs cg46 --annovar_dbs cg69 --annovar_dbs nci60
annovar_fun
Functions of the individual databases can be found at
What function your DB may already be listed otherwise it is probably listed in the URLS under Annotation: Gene-Based, Region-Based, or Filter-Based
Functions must be given in the corresponding order of your annovar_dbs
#Option is an ArrayRef, and can be given as either
--annovar_fun f,f,g
#or
--annovar_fun f --annovar_fun f --annovar_fun g
annovar_cols
Some database annotations generate multiple columns. For reannotating the vcf we need to know what these columns are. Below are the columns generated for the databases given in annovar_dbs
To add give a hashref of array
vcfs
VCF files can be given individually as well.
#Option is an ArrayRef and can be given as either
--vcfs 1.vcf,2.vcf,3.vcfs
#or
--vcfs 1.vcf --vcfs 2.vcf --vcfs 3.vcf
Don't mix the methods
annotate_vcf
Use vcf-annotate from VCF tools to annotate the VCF file
This does not overwrite the original VCF file, but instead creates a new one
To turn this off
annovar-wrapper.pl --annotate_vcf 0
Internal Variables
You shouldn't need to change these
SUBROUTINES/METHODS
run
Subroutine that starts everything off
check_files
Check to make sure either an indir or vcfs are supplied
find_vcfs
Use File::Find::Rule to find the vcfs
parse_commands
Allow for giving ArrayRef either in the usual fashion or with commas
write_annovar
Write the commands that
Convert the vcf file to annovar input Do the annotations Reannotate the vcf - if you want
iter_vcfs
Iterate over the vcfs with some changes for lookups
get_samples
Using VCF tools get the samples listed per vcf file
Supports files that are bgzipped or not
Sample names are stripped of all non alphanumeric characters.
convert_annovar
Print out the command to print the convert2annovar commands
table_annovar
Print out the commands to generate the annotation using table_annovar.pl command.
vcf_annotate
Generate the commands to annotate the vcf file using vcf-annotate
gen_descr
Bgzip, tabix, all of vcftools, and sort must be in your PATH for these to work.
There are two parts to this command.
The first prepares the annotation file.
1. The annotation file is backed up just in case 2. The annotation file is sorted, because I had some problems with sorting 3. The annotation file is bgzipped, as required by vcf-annotate 4. The annotation file is tabix indexed using the special commands -s 1 -b 2 -e 3
The second writes out the vcf-annotate commands
Example with RefGene zcat ../../variants.vcf.gz | vcf-annotate -a sorted.annotation.gz \ -d key=INFO,ID=SAMPLEID_Func_refGene,Number=0,Type=String,Description='SAMPLEID Annovar Func_refGene' \ -d key=INFO,ID=SAMPLEID_Gene_refGene,Number=0,Type=String,Description='SAMPLEID Annovar Gene_refGene' \ -d key=INFO,ID=SAMPLEID_ExonicFun_refGene,Number=0,Type=String,Description='SAMPLEID Annovar ExonicFun_refGene' \ -d key=INFO,ID=SAMPLEID_AAChange_refGene,Number=0,Type=String,Description='SAMPLEID Annovar AAChange_refGene' \ -c CHROM,FROM,TO,-,-,INFO/SAMPLEID_Func_refGene,INFO/SAMPLEID_Gene_refGene,INFO/SAMPLEID_ExonicFun_refGene,INFO/SAMPLEID_AAChange_refGene > SAMPLEID.annotated.vcf
gen_annot
gen_cols
Generate the -c portion of the vcf-annotate command
merge_vcfs
There is one vcf-annotated file per sample, so merge those at the the end to get a multisample file using vcf-merge
subset_vcfs
vcf-merge used in this fashion will create a lot of redundant columns, because it wants to assume all sample names are unique
Straight from the vcftools documentation
vcf-subset -c NA0001,NA0002 file.vcf.gz | bgzip -c > out.vcf.gz
AUTHOR
Jillian Rowe, <jillian.e.rowe at gmail.com>
BUGS
Please report any bugs or feature requests to bug-annovar-wrapper at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Annovar-Wrapper. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Annovar::Wrapper
You can also look for information at:
RT: CPAN's request tracker (report bugs here)
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
ACKNOWLEDGEMENTS
This module is a wrapper around the well developed annovar pipeline. The commands come straight from the documentation.
This module was originally developed at and for Weill Cornell Medical College in Qatar within ITS Advanced Computing Team and scientific input from Khalid Fahkro. With approval from WCMC-Q, this information was generalized and put on github, for which the authors would like to express their gratitude.
LICENSE AND COPYRIGHT
Copyright 2014 Weill Cornell Medical College Qatar.
This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:
http://www.perlfoundation.org/artistic_license_2_0
Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.
If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.
This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.
This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.
Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.