NAME
VCF.pm. Module for validation, parsing and creating VCF files. Supported versions: 3.2, 3.3, 4.0, 4.1, 4.2
SYNOPSIS
From the command line: perl -MVCF -e validate example.vcf perl -I/path/to/the/module/ -MVCF -e validate_v32 example.vcf
From a script: use VCF;
my $vcf = VCF->new(file=>'example.vcf.gz',region=>'1:1000-2000');
$vcf->parse_header();
# Do some simple parsing. Most thorough but slowest way how to get the data.
while (my $x=$vcf->next_data_hash())
{
for my $gt (keys %{$$x{gtypes}})
{
my ($al1,$sep,$al2) = $vcf->parse_alleles($x,$gt);
print "\t$gt: $al1$sep$al2\n";
}
print "\n";
}
# This will split the fields and print a list of CHR:POS
while (my $x=$vcf->next_data_array())
{
print "$$x[0]:$$x[1]\n";
}
# This will return the lines as they were read, including the newline at the end
while (my $x=$vcf->next_line())
{
print $x;
}
# Only the columns NA00001, NA00002 and NA00003 will be printed.
my @columns = qw(NA00001 NA00002 NA00003);
print $vcf->format_header(\@columns);
while (my $x=$vcf->next_data_array())
{
# this will recalculate AC and AN counts, unless $vcf->recalc_ac_an was set to 0
print $vcf->format_line($x,\@columns);
}
$vcf->close();
validate
About : Validates the VCF file.
Usage : perl -MVCF -e validate example.vcf.gz # (from the command line)
validate('example.vcf.gz'); # (from a script)
validate(\*STDIN);
Args : File name or file handle. When no argument given, the first command line
argument is interpreted as the file name.
validate_v32
About : Same as validate, but assumes v3.2 VCF version.
Usage : perl -MVCF -e validate_v32 example.vcf.gz # (from the command line)
Args : File name or file handle. When no argument given, the first command line
argument is interpreted as the file name.
new
About : Creates new VCF reader/writer.
Usage : my $vcf = VCF->new(file=>'my.vcf', version=>'3.2');
Args :
fh .. Open file handle. If neither file nor fh is given, open in write mode.
file .. The file name. If neither file nor fh is given, open in write mode.
region .. Optional region to parse (requires tabix indexed VCF file)
silent .. Unless set to 0, warning messages may be printed.
strict .. Unless set to 0, the reader will die when the file violates the specification.
version .. If not given, '4.0' is assumed. The header information overrides this setting.
open
About : (Re)Open file. No need to call this explicitly unless reading from a different
region is requested.
Usage : $vcf->open(); # Read from the start
$vcf->open(region=>'1:12345-92345');
Args : region .. Supported only for tabix indexed files
close
About : Close the filehandle
Usage : $vcf->close();
Args : none
Returns : close exit status
next_line
About : Reads next VCF line.
Usage : my $vcf = VCF->new();
my $x = $vcf->next_line();
Args : none
next_data_array
About : Reads next VCF line and splits it into an array. The last element is chomped.
Usage : my $vcf = VCF->new();
$vcf->parse_header();
my $x = $vcf->next_data_array();
Args : Optional line to parse
set_samples
About : Parsing big VCF files with many sample columns is slow, not parsing unwanted samples may speed things a bit.
Usage : my $vcf = VCF->new();
$vcf->set_samples(include=>['NA0001']); # Exclude all but this sample. When the array is empty, all samples will be excluded.
$vcf->set_samples(exclude=>['NA0003']); # Include only this sample. When the array is empty, all samples will be included.
my $x = $vcf->next_data_hash();
Args : Optional line to parse