NAME

Search::Tools::TokenPP - a token object returned from a TokenList

SYNOPSIS

use Search::Tools::Tokenizer;
my $tokenizer = Search::Tools::Tokenizer->new();
my $tokens = $tokenizer->tokenize_pp('quick brown red dog');
while ( my $token = $tokens->next ) {
    # token isa Search::Tools::TokenPP
    print "token = $token\n";
    printf("str: %s, len = %d, u8len = %d, pos = %d, is_match = %d, is_hot = %d\n",
       $token->str,
       $token->len, 
       $token->u8len, 
       $token->pos, 
       $token->is_match, 
       $token->is_hot
    );
}

DESCRIPTION

A TokenPP represents one or more characters culled from a string by a Tokenizer.

METHODS

TokenPP is a pure-Perl version of Token. See the Token docs for more details.

This class inherits from Search::Tools::Object. Only new or overridden methods are documented here.

str

The characters in the token. Stringifies to the str() value with overloading.

len

The byte length of str().

u8len

The character length of str(). For ASCII, len() == u8len(). For non-ASCII UTF-8, u8len() < len().

pos

The zero-based position in the original string.

is_match

Did the token match the re() in the Tokenizer.

is_hot

Did the token match the heat_seeker in the Tokenizer.

is_sentence_start

is_sentence_end

Returns true value if the Token matches common sentence-ending punctuation.

set_hot

Set the is_hot() value.

set_match

Set the is_match() value.

AUTHOR

Peter Karman <karman@cpan.org>

BUGS

Please report any bugs or feature requests to bug-search-tools at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Tools. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Search::Tools

You can also look for information at:

COPYRIGHT

Copyright 2009 by Peter Karman.

This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.