NAME

BioX::Wrapper::Annovar - A wrapper around the annovar annotation pipeline

VERSION

Version 0.06

SYNOPSIS

annovar-wrapper.pl --vcfs file1.vcf,file2.vcf --annovardb_path /path/to/annovar/dbs

This module is a wrapper around the popular annotation tool, annovar. http://www.openbioinformatics.org/annovar/ . The commands generated are taken straight from the documentation. In addition, there is an option to reannotate using vcf-annotate from vcftools.

It takes as its input a list or directory of vcf files, bgzipped and tabixed or not, and uses annovar to create annotation files. These multianno table files can be optionally reannotated into the vcf file. This script does not actually execute any commands, only writes them to STDOUT for the user to run as they wish.

It comes with an executable script annovar-wrapper.pl. This should be sufficient for most of your needs, but if you wish to overwrite methods you can always do so in the usual Moose fashion.

#!/usr/bin/env perl

package Main;

use Moose;
extends 'BioX::Wrapper::Annovar';

BioX::Wrapper::Annovar->new_with_options->run;

sub method_to_override {
    my $self = shift;

    #dostuff
};

before 'method' => sub  {
    my $self = shift;

    #dostuff
};

has '+variable' => (
    #things to add to variable declaration
);

#or

has 'variable' => (
    #override variable declaration
);

1;

Please see the Moose::Manual::MethodModifiers for more information.

Prerequisites

This module requires the annovar download. The easiest thing to do is to put the annovar scripts in your ENV{PATH}, but if you choose not to do this you can also pass in the location with

annovar-wrapper.pl --tableannovar_path /path/to/table_annovar.pl --convert2annovar_path /path/to/convert2annovar.pl

It requires Vcf.pm, which comes with vcftools.

Vcftools is publicly available for download. http://vcftools.sourceforge.net/.

export PERL5LIB=$PERL5LIB:path_to_vcftools/perl

If you wish to you reannotate the vcf file you need to have bgzip and tabix installed, and have the executables in vcftools in your path.

export PATH=$PATH:path_to_vcftools

Generate an Example

To generate an example you can run the following commands

tabix -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/release/20100804/ALL.2of4intersection.20100804.genotypes.vcf.gz 2:39967768-40000000 > test.vcf
bgzip test.vcf
tabix test.vcf.gz
vcf-subset -c HG00098,HG00100,HG00106,HG00112,HG00114 test.vcf.gz | bgzip -c > out.vcf.gz
tabix out.vcf.gz
rm test.vcf.gz
rm test.vcf.gz.tbi

annovar-wrapper.pl --vcfs out.vcf.gz --annovar_dbs refGene --annovar_fun g --outdir annovar_out --annovardb_path /path/to/annovar/dbs > my_cmds.sh

There is more detail on the example in the pod files.

Variables

Annovar Options

tableannovar_path

You can put the location of the annovar scripts in your ENV{PATH}, and the default is fine. If annovar is not in your PATH, please supply the location.

convert2annovar_path

You can put the location of the annovar scripts in your ENV{PATH}, and the default is fine. If annovar is not in your PATH, please supply the location

annovardb_path

Path to your annovar databases

buildver

Its probably hg19 or hg18

convert2annovar_opts

Assumes vcf version 4 and that you want to convert all samples

Not using --allsample on a multisample vcf is untested and will probably break the whole pipeline

annovar_dbs

These are pretty much all the databases listed on

http://www.openbioinformatics.org/annovar/annovar_download.html for hg19 that I tested as working

#Download databases with

cd path_to_annovar_dir
./annotate_variation.pl --buildver hg19 -downdb -webfrom annovar esp6500si_aa hg19/

#Option is an ArrayRef, and can be given as either

--annovar_dbs cg46,cg69,nci60

#or

--annovar_dbs cg46 --annovar_dbs cg69 --annovar_dbs nci60

annovar_fun

Functions of the individual databases can be found at

What function your DB may already be listed otherwise it is probably listed in the URLS under Annotation: Gene-Based, Region-Based, or Filter-Based

Functions must be given in the corresponding order of your annovar_dbs

#Option is an ArrayRef, and can be given as either

--annovar_fun f,f,g

#or

--annovar_fun f --annovar_fun f --annovar_fun g

annovar_cols

Some database annotations generate multiple columns. For reannotating the vcf we need to know what these columns are. Below are the columns generated for the databases given in annovar_dbs

To add give a hashref of array

vcfs

VCF files can be given individually as well.

#Option is an ArrayRef and can be given as either

--vcfs 1.vcf,2.vcf,3.vcfs

#or

--vcfs 1.vcf --vcfs 2.vcf --vcfs 3.vcf

Don't mix the methods

annotate_vcf

Use vcf-annotate from VCF tools to annotate the VCF file

This does not overwrite the original VCF file, but instead creates a new one

To turn this off

annovar-wrapper.pl --annotate_vcf 0

Internal Variables

You shouldn't need to change these

SUBROUTINES/METHODS

run

Subroutine that starts everything off

check_files

Check to make sure either an indir or vcfs are supplied

find_vcfs

Use File::Find::Rule to find the vcfs

parse_commands

Allow for giving ArrayRef either in the usual fashion or with commas

write_annovar

Write the commands that

Convert the vcf file to annovar input Do the annotations Reannotate the vcf - if you want

iter_vcfs

Iterate over the vcfs with some changes for lookups

get_samples

Using VCF tools get the samples listed per vcf file

Supports files that are bgzipped or not

Sample names are stripped of all non alphanumeric characters.

convert_annovar

Print out the command to print the convert2annovar commands

table_annovar

Print out the commands to generate the annotation using table_annovar.pl command.

vcf_annotate

Generate the commands to annotate the vcf file using vcf-annotate

gen_descr

Bgzip, tabix, all of vcftools, and sort must be in your PATH for these to work.

There are two parts to this command.

The first prepares the annotation file.

1. The annotation file is backed up just in case 2. The annotation file is sorted, because I had some problems with sorting 3. The annotation file is bgzipped, as required by vcf-annotate 4. The annotation file is tabix indexed using the special commands -s 1 -b 2 -e 3

The second writes out the vcf-annotate commands

Example with RefGene zcat ../../variants.vcf.gz | vcf-annotate -a sorted.annotation.gz \ -d key=INFO,ID=SAMPLEID_Func_refGene,Number=0,Type=String,Description='SAMPLEID Annovar Func_refGene' \ -d key=INFO,ID=SAMPLEID_Gene_refGene,Number=0,Type=String,Description='SAMPLEID Annovar Gene_refGene' \ -d key=INFO,ID=SAMPLEID_ExonicFun_refGene,Number=0,Type=String,Description='SAMPLEID Annovar ExonicFun_refGene' \ -d key=INFO,ID=SAMPLEID_AAChange_refGene,Number=0,Type=String,Description='SAMPLEID Annovar AAChange_refGene' \ -c CHROM,FROM,TO,-,-,INFO/SAMPLEID_Func_refGene,INFO/SAMPLEID_Gene_refGene,INFO/SAMPLEID_ExonicFun_refGene,INFO/SAMPLEID_AAChange_refGene > SAMPLEID.annotated.vcf

gen_annot

gen_cols

Generate the -c portion of the vcf-annotate command

merge_vcfs

There is one vcf-annotated file per sample, so merge those at the the end to get a multisample file using vcf-merge

subset_vcfs

vcf-merge used in this fashion will create a lot of redundant columns, because it wants to assume all sample names are unique

Straight from the vcftools documentation

vcf-subset -c NA0001,NA0002 file.vcf.gz | bgzip -c > out.vcf.gz

AUTHOR

Jillian Rowe, <jillian.e.rowe at gmail.com>

BUGS

Please report any bugs or feature requests to bug-annovar-wrapper at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Annovar-Wrapper. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Annovar::Wrapper

You can also look for information at:

ACKNOWLEDGEMENTS

This module is a wrapper around the well developed annovar pipeline. The commands come straight from the documentation.

This module was originally developed at and for Weill Cornell Medical College in Qatar within ITS Advanced Computing Team and scientific input from Khalid Fahkro. With approval from WCMC-Q, this information was generalized and put on github, for which the authors would like to express their gratitude.

LICENSE AND COPYRIGHT

Copyright 2014 Weill Cornell Medical College Qatar.

This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at:

http://www.perlfoundation.org/artistic_license_2_0

Any use, modification, and distribution of the Standard or Modified Versions is governed by this Artistic License. By using, modifying or distributing the Package, you accept this license. Do not use, modify, or distribute the Package, if you do not accept this license.

If your Modified Version has been derived from a Modified Version made by someone other than you, you are nevertheless required to ensure that your Modified Version complies with the requirements of this license.

This license does not grant you the right to use any trademark, service mark, tradename, or logo of the Copyright Holder.

This license includes the non-exclusive, worldwide, free-of-charge patent license to make, have made, use, offer to sell, sell, import and otherwise transfer the Package with respect to any patent claims licensable by the Copyright Holder that are necessarily infringed by the Package. If you institute patent litigation (including a cross-claim or counterclaim) against any party alleging that the Package constitutes direct or contributory patent infringement, then this Artistic License to you shall terminate on the date that such litigation is filed.

Disclaimer of Warranty: THE PACKAGE IS PROVIDED BY THE COPYRIGHT HOLDER AND CONTRIBUTORS "AS IS' AND WITHOUT ANY EXPRESS OR IMPLIED WARRANTIES. THE IMPLIED WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT ARE DISCLAIMED TO THE EXTENT PERMITTED BY YOUR LOCAL LAW. UNLESS REQUIRED BY LAW, NO COPYRIGHT HOLDER OR CONTRIBUTOR WILL BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, OR CONSEQUENTIAL DAMAGES ARISING IN ANY WAY OUT OF THE USE OF THE PACKAGE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.