NAME
Search::Tools::HeatMap - locate the best matches in a snippet extract
SYNOPSIS
use Search::Tools::Tokenizer;
use Search::Tools::HeatMap;
my $tokens = $self->tokenizer->tokenize( $my_string, qr/^(interesting)$/ );
my $heatmap = Search::Tools::HeatMap->new(
tokens => $tokens,
window_size => 20, # default
as_sentences => 0, # default
);
if ( $heatmap->has_spans ) {
my $tokens_arr = $tokens->as_array;
# stringify positions
my @snips;
for my $span ( @{ $heatmap->spans } ) {
push( @snips, $span->{str} );
}
my $occur_index = $self->occur - 1;
if ( $#snips > $occur_index ) {
@snips = @snips[ 0 .. $occur_index ];
}
printf("%s\n", join( ' ... ', @snips ));
}
DESCRIPTION
Search::Tools::HeatMap implements a simple algorithm for locating the densest clusters of unique, hot terms in a TokenList.
HeatMap is used internally by Snipper but documented here in case someone wants to abuse and/or improve it.
METHODS
new( tokens => TokenList )
Create a new HeatMap. The TokenList object may be either a Search::Tools::TokenList or Search::Tools::TokenListPP object.
BUILD
Builds the HeatMap object. Called internally by new().
window_size
The max width of a span. Defaults to 20 tokens, including the matches.
Set this in new(). Access it later if you need to, but the spans will have already been created by new().
as_sentences
Try to match clusters at sentence boundaries. Default is false.
Set this in new().
spans
Returns an array ref of matching clusters. Each span in the array is a hash ref with the following keys:
- cluster
- pos
- heat
- str
- str_w_pos
-
This item is available only if debug() is true.
- unique
has_spans
Returns the number of spans found.
AUTHOR
Peter Karman <karman at cpan dot org>
ACKNOWLEDGEMENTS
The idea of the HeatMap comes from KinoSearch, though the implementation here is original.
BUGS
Please report any bugs or feature requests to bug-search-tools at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Tools. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Search::Tools
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
COPYRIGHT
Copyright 2009 by Peter Karman.
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
SEE ALSO
KinoSearch