NAME

ShatterProof - a script for analyzing next-generation sequencing data

SYNOPSIS

use Shatterproof

See "shatterproof.pl" in the scripts directory for a simple perl script which calls the ShatterProof module

Call ShatterProof via:

ShatterProof::run(\@ARGV);

DESCRIPTION

ShatterProof is a tool that can be used to analyze next generation sequencing data for signs of chromothripsis. ShatterProof is implemented as a Perl module that processes input files and produces output files in both tab-delimited and YAML format. Perl version 5.0 or greater is required to run ShatterProof. Link to publication will be posted soon.

README

Installing ShatterProof To install this module type the following:

perl Makefile.PL
make
make test
make install

Input File Types

ShatterProof bases its analysis of genomic data on calls of translocations, copy number variations (CNV), loss of heterozygosity (LOH) and insertions. ShatterProof can takes as input 4 different types of input files. See the scripts/conversion_scripts directory for some Perl scripts which will convert some common tools' output to the required input formats.

Translocation Input Files (.spt)

Tab delimited columns First line is header line: #chr1 start end chr2 start end quality

Example data entry line:

1       1000    2000    4       4000    5000    78

If no value is available for quality, use a "." eg.:

1       1000    2000    4       4000    5000    .

Copy-Number Input Files (.spc)

Tab delimited columns First line is header line: #chr start end number quality

Example data entry line: 12 2000 3000 2 63

If no value is available for quality, use a "." eg.:

12      2000    3000    2       .

Loss of Heterozygozity Input Files (.spl)

Tab delimited columns First line is header line: #chr start end quality

Example data entry line:

12      2000    3000	63

If no value is available for quality, use a "." eg.:

12      2000    3000	.

Insertion Input Files (.vcf)

Additionally, ShatterProof accepts insertion calls in VCF files as input. See http://www.1000genomes.org/node/101 for details on the VCF file format. ShatterProof analyzes the CHROM and POS fields of these files.

Configuring ShatterProof

See the config.pl file in the scripts directory for a sample ShatterProof configuration file.

$bin_size: number (integer) of base pairs to include in each bin of the sliding window analysis

$localization_window_size: number (integer) of bins to include in each window of the sliding window analysis

$expected_mutation_density: a reference value (double) used in determining if the concentration of translocation events on a particular chromosome is higher than expected.

$collapse_regions:

flag variable

value 1: merge overlapping CNV regions that have the same copy number

value 0: do not merge overlapping CNV regions that have the same copy number. 	If such regions are found an error is thrown

$outlier_deviations: the number of standard deviations away from the mean a value has to be in order to be considered non-significant. Used to identify highly mutated regions.

$translocation_cut_off_count: the maximum number of translocation chromosomes to tolerate before the translocation score for a region is set to 0.

$genome_localization_weight: weight given to the localization of mutations to one chromosome hallmark

$chromosome_localization_weight: weight given to the localization of mutations to one area of a particular chromosome hallmark

$cnv_weight: weight given to the concentrated CNV hallmark

$translocation_weight: weight give to the concentrated translocations hallmark

$insertion_breakpoint_weight: weight given the the short breakpoint insertions hallmark

$loh_weight: weight given to the loss/retention of heterozygosity hallmark

$tp53_mutated_weight: weight given to the TP53 mutation hallmark

Running ShatterProof

From the scripts directory run execute the shatterproof.pl file using Perl.

Main Usage:

perl -w shatterproof.pl --cnv <dir> --trans <dir> [--insrt <dir>] [--loh <dir>] [--tp53] --config <path> --output <dir>

Arguments:

--cnv Define the path to the directory containing the CNV input files

--trans Define the path to the directory containing the Translocation input files

--insrt Define the path to the directory containing the insertion VCF input files

--loh Define the path to the directory containing the LOH input files

--tp53 Indicate that TP53 should be considered mutated, regardless of data

--config Define the path to the ShatterProof config file

--output Define the path to the directory where output should be placed

dir Path to a directory

path Path to a file

PREREQUISITES

strict; warnings; Carp; Switch; File::Basename; List::Util qw[min max]; Statistics::Distributions; POSIX

any

CPAN