NAME
BioX::Wrapper::Gemini - A simple wrapper around the python Gemini library for annotating VCF files.
SYNOPSIS
Basic Usage
gemini_wrapper.pl --indir /path/to/vcfs --outdir /location/we/can/write/to > commands.in
Customized workflow
For more involved usage please see BioX::Wrapper::Gemini::Example
Using the API
BioX::Wrapper::Gemini is written using Moose and can be extended in all the usual fashions.
use BioX::Wrapper::Gemini;
after 'db_load' =>
sub {
my $self = shift;
# Run some commands
# SCIENCE!
}
Description
A wrapper around Gemini for processing files.
Read more about Gemini here: http://gemini.readthedocs.org/en/latest/
The workflow described is taken straight from the documentation written by the author of Gemini.
For more customization please see the attributes sections of the docs
Attributes
Moose Attributes
vcfs
VCF files can be given individually as well.
#Option is an ArrayRef and can be given as either
--vcfs 1.vcf,2.vcf,3.vcfs
#or
--vcfs 1.vcf --vcfs 2.vcf --vcfs 3.vcf
Don't mix the methods
If these vcfs are uncompressed, they will be compressed in place. Please make sure either this location has read/write access, or create a symbolic link to someplace
Everytime you leave genomics data uncompressed a kitten dies!
uncomvcfs
Vcfs that are uncompressed
ref
Supply a path to a reference genome
Default is to assume there is an environmental variable $REFGENOME
snpeff
Base directory of snpeff
The default assumes there is an environmental variable of $SNPEFF, being the base directory of the snpeff installation.
snpeff_opt
Options to run snpeff with
Default is -c \$SNPEFF/snpEff.config -formatEff -classic GRCh37.75
ped
If all vcf files are being loaded into the gemini db with the same pedigree file, simply change the --db_load_opts to correspond to your file.
If each vcf file has its own pedigree, make sure the pedigree file matches the basename of the vcf.
Basenames are captured like so:
my @gzipbase = map { basename($_, ".vcf.gz") } @gzipped ;
my @notgzipbase = map { basename($_, ".vcf") } @notgzipped ;
With the extension being .vcf.gz/.vcf
Invoke this with --ped
Exact specifications should be found here:
http://gemini.readthedocs.org/en/latest/content/preprocessing.html#describing-samples-with-a-ped-file
ped_dir
If using the --ped option you must specify this if your pedigree files are not in the same directory as the --indir option
db_load_opts
Options for loading VCF file into gemini sqlite db
Default is -t snpEff
This used to be --skip_cadd -t snpeff, but by popular demand is now just -t snpEff
Subroutines
Subroutines
check_files
Check to make sure either an indir or vcfs are supplied
find_vcfs
Use File::Find::Rule to find the vcfs
Make sure they are all gzipped first. If there are any .vcf$ files without a corresponding .vcf.gz$, bgzip those
bgzip
Run bgzip command on files found in find_vcfs
norml
normalize vcfs using vt and annotate using SNPEFF
db_load
Load DB into gemini
run
Subroutine that starts everything off
AUTHOR
Jillian Rowe <jillian.e.rowe@gmail.com>
ACKNOWLEDGEMENTS
This module was originally developed at and for Weill Cornell Medical College in Qatar within ITS Advanced Computing Team. With approval from WCMC-Q, this information was generalized and put on github, for which the authors would like to express their gratitude.
COPYRIGHT
Copyright 2015- Weill Cornell Medical College in Qatar
LICENSE
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.