Bio::Kmer - Helper module for Kmer Analysis.


A module for helping with kmer analysis.

use strict;
use warnings;
use Bio::Kmer;

my $kmer=Bio::Kmer->new("file.fastq.gz",{kmercounter=>"jellyfish",numcpus=>4});
my $kmerHash=$kmer->kmers();
my $countOfCounts=$kmer->histogram();

my $minimizers = $kmer->minimizers();
my $minimizerCluster = $kmer->minimizerCluster();

The BioPerl way

use strict;
use warnings;
use Bio::SeqIO;
use Bio::Kmer;

# Load up any Bio::SeqIO object. Quality values will be
# faked internally to help with compatibility even if
# a fastq file is given.
my $seqin = Bio::SeqIO->new(-file=>"input.fasta");
my $kmer=Bio::Kmer->new($seqin);
my $kmerHash=$kmer->kmers();
my $countOfCounts=$kmer->histogram();


A module for helping with kmer analysis. The basic methods help count kmers and can produce a count of counts. Currently this module only supports fastq format. Although this module can count kmers with pure perl, it is recommended to give the option for a different kmer counter such as Jellyfish.


* BioPerl
* Jellyfish >=2
* Perl threads
* Perl >=5.10



Boolean describing whether the module instance is using threads


Bio::Kmer->new($filename, \%options)

Create a new instance of the kmer counter. One object per file.

Filename can be either a file path or a Bio::SeqIO object.

Applicable arguments for \%options:
Argument     Default    Description
kmercounter  perl       What kmer counter software to use.
                        Choices: Perl, Jellyfish.
kmerlength|k 21         Kmer length
numcpus      1          This module uses perl 
                        multithreading with pure perl or 
                        can supply this option to other 
                        software like jellyfish.
gt           1          If the count of kmers is fewer 
                        than this, ignore the kmer. This 
                        might help speed analysis if you 
                        do not care about low-count kmers.
sample       1          Retain only a percentage of kmers.
                        1 is 100%; 0 is 0%
                        Only works with the perl kmer counter.
verbose      0          Print more messages.

my $kmer=Bio::Kmer->new("file.fastq.gz",{kmercounter=>"jellyfish",numcpus=>4});

Returns the number of base pairs counted. In some cases such as when counting with Jellyfish, that number is not calculated; instead the length is calculated by the total length of kmers. Internally, this number is stored as $kmer->{_ntcount}.

Note: internally runs $kmer->histogram() if $kmer->{_ntcount} is not initially found.

Arguments: None
Returns:   integer

Count kmers. This method is called as soon as new() is called and so you should never have to run this method. Internally caches the kmer counts to ram.

Arguments: None
Returns:   None

Clears kmer counts and histogram counts. You should probably never use this method.

Arguments: None
Returns:   None

Query the set of kmers with your own query

Arguments: query (string)
Returns:   Count of kmers. 
            0 indicates that the kmer was not found.
           -1 indicates an invalid kmer (e.g., invalid length)

Count the frequency of kmers. Internally caches the histogram to ram.

Arguments: none
Returns:   Reference to an array of counts. The index of 
           the array is the frequency.

Return actual kmers

Arguments: None
Returns:   Reference to a hash of kmers and their counts

Finds minimizer of each kmer

Arguments: length of minimizer (default: 5)
returns: hash ref, e.g., $hash = {AAAAA=>AAA, TAGGGT=>AGG,...}

Finds minimizer of each kmer

Arguments: length of minimizer (default: 5). 
  Internally, calls $kmer->minimizer($l) 
  If $kmer->minimizer has already been called, this parameter will be ignored.
returns: hash ref, e.g., $hash = {AAA=>[TAAAT, AAAGG,...], ATT=>[GATTC,...]}}

Finds the union between two sets of kmers

Arguments: Another Bio::Kmer object
Returns:   List of kmers

Finds the intersection between two sets of kmers

Arguments: Another Bio::Kmer object
Returns:   List of kmers

Finds the set of kmers unique to this Bio::Kmer object.

Arguments: Another Bio::Kmer object
Returns:   List of kmers

Cleans the temporary directory and removes this object from RAM. Good for when you might be counting kmers for many things but want to keep your overhead low.

Arguments: None
Returns:   1


MIT license. Go nuts.


Author: Lee Katz <>

