NAME
Compress::BGZF::Writer - Performs blocked GZIP (BGZF) compression
VERSION
version 0.001
SYNOPSIS
use Compress::BGZF::Writer;
# Use as filehandle
my $fh_bgz = Compress::BGZF::Writer->new_filehandle( $bgz_filename );
print ref($writer), "\n"; # prints 'GLOB'
while ( my $chunk = generate_data() ) {
print {$fh_bgz} $chunk;
}
close $fh_bgz;
# Use as object
my $writer = Compress::BGZF::Writer->new( $bgz_filename );
print ref($writer), "\n"; # prints 'Compress::BGZF::Writer'
while ( my ($id,$content) = generate_record() ) {
my $virt_offset = $writer->add_data( $content );
my $content_len = length $content;
print {$idx_file} "$id\t$virt_offset\t$content_len\n";
}
$writer->finalize(); # flush remaining buffer;
DESCRIPTION
Compress::BGZF::Writer
is a module for writing blocked GZIP (BGZF) files from any input. There are two main modes of construction - as an object (using new()
) and as a filehandle glob (using new_filehandle
). The filehandle mode is straightforward for general use. The object mode is useful for tracking the virtual offsets of data chunks as they are added (for instance, for generation of a custom index).
METHODS
Filehandle Functions
- new_filehandle
-
my $fh_out = Compress::BGZF::Writer->new_filehandle(); my $fh_out = Compress::BGZF::Writer->new_filehandle( $output_fn );
Create a new
Compress::BGZF::Writer
engine and tie it to a IO::File handle, which is returned. Takes an optional single argument for the filename to be written to (defaults to STDOUT). - close
-
print {$fh_out} $some_data; close $fh_out;
These functions emulate the standard perl functions of the same name.
0bject-oriented Methods
- new
-
my $writer = Compress::BGZF::Writer->new(); my $writer = Compress::BGZF::Writer->new( $output_fn );
Create a new
Compress::BGZF::Writer
engine. Takes an optional single argument for the filename to be written to (defaults to STDOUT). - set_level
-
$writer->set_level( $compression_level );
Set the DEFLATE compression level to use (0-9). Available constants include Z_NO_COMPRESSION, Z_BEST_SPEED, Z_DEFAULT_COMPRESSION, Z_BEST_COMPRESSION (defaults to Z_DEFAULT_COMPRESSION). The author's observations suggest that the default is reasonable unless speed is of the essence, in which case setting a level of 1-2 can sometimes halve the compression time.
- add_data
-
$writer->add_data( $content );
Adds a block of conent to the write buffer. Actual compression/writes take place as the buffer reaches the target size (64k minus header/footer space). Returns the virtual offset to the start of the data added.
- finalize
-
$writer->finalize();
Write any remaining buffer contents. While this method should be automatically called during cleanup of the Compress::BGZF::Writer object, it is probably safer to call it explicitly to avoid unexpected behavior. Keep in mind that if both you and the object destruction process fail to call this, you will almost certainly generate an incomplete file (and probably won't notice since it will still be valid BGZF).
- write_index
-
$writer->write_index( $index_fn );
Write offset index to the specified file. Index format (as defined by htslib) consists of little-endian int64-coded values. The first value is the number of offsets in the index. The rest of the values consist of pairs of block offsets relative to the compressed and uncompressed data. The first offset (always 0,0) is not included.
Note that calling
write_index()
will also callfinalize()
and so should always be called after all data has been queued for write (it is hard to imagine a case where this would not be the desirable behavior).For small(er) files (up to a few hundred MB) on-the-fly index generation with Compress::BGZF::Reader is relatively fast and an on-disk index is probably not necessary. For larger files, storing a paired index file can signficantly decrease initialization times for Compress::BGZF::Reader objects.
These index files should be fully compatible with the htslib bgzip tool.
CAVEATS AND BUGS
This is code is in alpha testing stage. The filehandle behavior should not change in the future, but the object-oriented API is not guaranteed to be stable.
Please reports bugs to the author.
AUTHOR
Jeremy Volkening <jeremy *at* base2bio.com>
COPYRIGHT AND LICENSE
Copyright 2015 Jeremy Volkening
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see <http://www.gnu.org/licenses/>.