NAME

Search::Tools::HeatMap - locate the best matches in a snippet extract

SYNOPSIS

use Search::Tools::Tokenizer;
use Search::Tools::HeatMap;
    
my $tokens = $self->tokenizer->tokenize( $my_string, qr/^(interesting)$/ );
my $heatmap = Search::Tools::HeatMap->new(
    tokens         => $tokens,
    window_size    => 20,  # default
    as_sentences   => 0,   # default
);

if ( $heatmap->has_spans ) {

    my $tokens_arr = $tokens->as_array;

    # stringify positions
    my @snips;
    for my $span ( @{ $heatmap->spans } ) {
        push( @snips, $span->{str} );
    }
    my $occur_index = $self->occur - 1;
    if ( $#snips > $occur_index ) {
        @snips = @snips[ 0 .. $occur_index ];
    }
    printf("%s\n", join( ' ... ', @snips ));
    
}

DESCRIPTION

Search::Tools::HeatMap implements a simple algorithm for locating the densest clusters of unique, hot terms in a TokenList.

HeatMap is used internally by Snipper but documented here in case someone wants to abuse and/or improve it.

METHODS

new( tokens => TokenList )

Create a new HeatMap. The TokenList object may be either a Search::Tools::TokenList or Search::Tools::TokenListPP object.

BUILD

Builds the HeatMap object. Called internally by new().

window_size

The max width of a span. Defaults to 20 tokens, including the matches.

Set this in new(). Access it later if you need to, but the spans will have already been created by new().

as_sentences

Try to match clusters at sentence boundaries. Default is false.

Set this in new().

spans

Returns an array ref of matching clusters. Each span in the array is a hash ref with the following keys:

cluster
pos
heat
str
str_w_pos

This item is available only if debug() is true.

unique

has_spans

Returns the number of spans found.

AUTHOR

Peter Karman <karman at cpan dot org>

ACKNOWLEDGEMENTS

The idea of the HeatMap comes from KinoSearch, though the implementation here is original.

BUGS

Please report any bugs or feature requests to bug-search-tools at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Tools. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Search::Tools

You can also look for information at:

COPYRIGHT

Copyright 2009 by Peter Karman.

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.

SEE ALSO

KinoSearch