NAME
Search::Tools::TokenPP - a token object returned from a TokenList
SYNOPSIS
use Search::Tools::Tokenizer;
my $tokenizer = Search::Tools::Tokenizer->new();
my $tokens = $tokenizer->tokenize_pp('quick brown red dog');
while ( my $token = $tokens->next ) {
# token isa Search::Tools::TokenPP
print "token = $token\n";
printf("str: %s, len = %d, u8len = %d, pos = %d, is_match = %d, is_hot = %d\n",
$token->str,
$token->len,
$token->u8len,
$token->pos,
$token->is_match,
$token->is_hot
);
}
DESCRIPTION
A TokenPP represents one or more characters culled from a string by a Tokenizer.
METHODS
TokenPP is a pure-Perl version of Token. See the Token docs for more details.
This class inherits from Search::Tools::Object. Only new or overridden methods are documented here.
str
The characters in the token. Stringifies to the str() value with overloading.
len
The byte length of str().
u8len
The character length of str(). For ASCII, len() == u8len(). For non-ASCII UTF-8, u8len() < len().
pos
The zero-based position in the original string.
is_match
Did the token match the re() in the Tokenizer.
is_hot
Did the token match the heat_seeker in the Tokenizer.
is_sentence_start
is_sentence_end
Returns true value if the Token matches common sentence-ending punctuation.
set_hot
Set the is_hot() value.
set_match
Set the is_match() value.
AUTHOR
Peter Karman <karman@cpan.org>
BUGS
Please report any bugs or feature requests to bug-search-tools at rt.cpan.org
, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Search-Tools. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.
SUPPORT
You can find documentation for this module with the perldoc command.
perldoc Search::Tools
You can also look for information at:
RT: CPAN's request tracker
AnnoCPAN: Annotated CPAN documentation
CPAN Ratings
Search CPAN
COPYRIGHT
Copyright 2009 by Peter Karman.
This package is free software; you can redistribute it and/or modify it under the same terms as Perl itself.