NAME
ShatterProof - a script for analyzing next-generation sequencing data
SYNOPSIS
use Shatterproof
See "shatterproof.pl" in the scripts directory for a simple perl script which calls the ShatterProof module
Call ShatterProof via:
ShatterProof::run(\@ARGV);
DESCRIPTION
ShatterProof is a tool that can be used to analyze next generation sequencing data for signs of chromothripsis. ShatterProof is implemented as a Perl module that processes input files and produces output files in both tab-delimited and YAML format. Perl version 5.0 or greater is required to run ShatterProof. Link to publication will be posted soon.
README
Installing ShatterProof
To install this module type the following:
perl Makefile.PL
make
make test
make install
Make sure that you have admin permission rights when running the previous commands.
Input File Types
ShatterProof bases its analysis of genomic data on calls of translocations, copy number variations (CNV), loss of heterozygosity (LOH) and insertions. ShatterProof can takes as input 4 different types of input files. See the scripts/conversion_scripts directory for some Perl scripts which will convert some common tools' output to the required input formats.
Translocation Input Files (.spt)
Tab delimited columns First line is header line: #chr1 start end chr2 start end quality
Example data entry line:
1 1000 2000 4 4000 5000 78
If no value is available for quality, use a "." eg.:
1 1000 2000 4 4000 5000 .
Copy-Number Input Files (.spc)
Tab delimited columns First line is header line: #chr start end number quality
Example data entry line: 12 2000 3000 2 63
If no value is available for quality, use a "." eg.:
12 2000 3000 2 .
Loss of Heterozygozity Input Files (.spl)
Tab delimited columns First line is header line: #chr start end quality
Example data entry line:
12 2000 3000 63
If no value is available for quality, use a "." eg.:
12 2000 3000 .
Insertion Input Files (.vcf)
Additionally, ShatterProof accepts insertion calls in VCF files as input. See http://www.1000genomes.org/node/101 for details on the VCF file format. ShatterProof analyzes the CHROM and POS fields of these files.
Configuring ShatterProof
See the config.pl file in the scripts directory for a sample ShatterProof configuration file.
$bin_size: number (integer) of base pairs to include in each bin of the sliding window analysis
$localization_window_size: number (integer) of bins to include in each window of the sliding window analysis
$expected_mutation_density: a reference value (double) used in determining if the concentration of translocation events on a particular chromosome is higher than expected.
$collapse_regions:
flag variable
value 1: merge overlapping CNV regions that have the same copy number
value 0: do not merge overlapping CNV regions that have the same copy number. If such regions are found an error is thrown
$outlier_deviations: the number of standard deviations away from the mean a value has to be in order to be considered non-significant. Used to identify highly mutated regions.
$translocation_cut_off_count: the maximum number of translocation chromosomes to tolerate before the translocation score for a region is set to 0.
$genome_localization_weight: weight given to the localization of mutations to one chromosome hallmark
$chromosome_localization_weight: weight given to the localization of mutations to one area of a particular chromosome hallmark
$cnv_weight: weight given to the concentrated CNV hallmark
$translocation_weight: weight give to the concentrated translocations hallmark
$insertion_breakpoint_weight: weight given the the short breakpoint insertions hallmark
$loh_weight: weight given to the loss/retention of heterozygosity hallmark
$tp53_mutated_weight: weight given to the TP53 mutation hallmark
Running ShatterProof
From the scripts directory run execute the shatterproof.pl file using Perl.
Main Usage:
perl -w shatterproof.pl --cnv <dir> --trans <dir> [--insrt <dir>] [--loh <dir>] [--tp53] --config <path> --output <dir>
Arguments:
--cnv Define the path to the directory containing the CNV input files
--trans Define the path to the directory containing the Translocation input files
--insrt Define the path to the directory containing the insertion VCF input files
--loh Define the path to the directory containing the LOH input files
--tp53 Indicate that TP53 should be considered mutated, regardless of data
--config Define the path to the ShatterProof config file
--output Define the path to the directory where output should be placed
dir Path to a directory
path Path to a file
PREREQUISITES
strict; warnings; Carp; Switch; File::Basename; List::Util qw[min max]; Statistics::Distributions; POSIX
any
CPAN