Why not adopt me?
NAME
Text::TokenStream - lexer to break text up into user-defined tokens
SYNOPSIS
my $lexer = Text::TokenStream::Lexer->new(
whitespace => [qr/\s+/],
rules => [
word => qr/\w+/,
sym => qr/[^\w\s]+/,
],
);
my $stream = Text::TokenStream->new(
lexer => $lexer,
input => "foo *",
);
my $tok1 = $stream->next; # --> "word" token containing "foo"
my $tok2 = $stream->next; # --> "sym" token containing "*"
DESCRIPTION
This class is part of a collection of classes that act together to lex (aka scan) an input text into a stream of tokens.
This token stream class provides the stream interface, along with a notion of the "current position" in the input text, and position-aware error reporting. It composes Text::TokenStream::Role::Stream; that role lists the methods this class provides (so that you can easily write a parser class that has
a token stream which in turn handles
the tokenizer methods).
The basic lexer machinery is found in Text::TokenStream::Lexer; it is separated out from the token stream so that it can be reused across many inputs.
Tokens are instances of a class, Text::TokenStream::Token by default.
CONSTRUCTOR
This class uses Moo, and inherits the standard new
constructor.
ATTRIBUTES
lexer
An instance of Text::TokenStream::Lexer; required; read-only. Will be used to find tokens in the input.
input
Str
; required; read-only. The text that will be lexed into a stream of tokens.
input_name
A Maybe[Path]
; read-only. Can be coerced from a string. If a defined value is present, it should contain the name of the file that the input was read from, and that name will be used in any error messages.
token_class
The name of a class that inherits from Text::TokenStream::Token; defaults to Text::TokenStream::Token itself; read-only. Tokens found in the input will be constructed as instances of this class.
OTHER METHODS
collect_all
Takes no arguments. Returns a list of all remaining tokens found in the input.
In the current implementation, this method is provided by Text::TokenStream::Role::Stream.
collect_upto
Takes a single argument indicating a token to match, as with Text::TokenStream::Token#matches
. Scans through the input until it finds a token that matches the argument, and returns a list of all tokens before the matching one. If no remaining token in the input matches the argument, behaves as "collect_all"
.
In the current implementation, this method is provided by Text::TokenStream::Role::Stream.
create_token
Takes a listified hash of token attributes, and creates a token instance. The token object is created by calling:
$self->token_class->new(%data);
If you have particularly complex needs, you may wish to override this method in a subclass.
current_position
Takes no arguments. Returns the 0-based position of the first input character that hasn't yet been returned by "next"
.
err
Takes multiple arguments, that are concatenated into an error message. (If no arguments are supplied, acts as if you'd supplied the string "Something's wrong"
.) Throws an exception, reporting the locus of the error as the current input position (using 1-based line and column numbers).
fill
Takes a single positive-integer argument. Attempts to fill an internal buffer of already-lexed tokens so that it contains that many tokens. Returns a boolean that is true iff there were enough tokens to do that.
looking_at
Takes zero or more arguments, each of which indicates a token to match, as with Text::TokenStream::Token#matches
. Returns a boolean that is true iff there's at least one more token in the input, and it matches the argument.
next
Takes no arguments. Returns the next token found in the input, and advances the current position past it; if no tokens remain, returns undef
. The token instance is created by "create_token"
.
next_of
Takes a single argument indicating a token to match, as with Text::TokenStream::Token#matches
, and an optional string argument describing the current position (for example, "in expression"
, or "after keyword"
). If there are no more tokens in the input, reports an error at the current position, using "err"
. Otherwise, if the next token doesn't match the argument, reports an error at the position of that token, using "token_err"
. Otherwise, the next token matches what is being looked for, so that token is returned.
peek
Takes no arguments. Returns the next token that would be returned by "next"
, but doesn't advance the current input position, and a subsequent "next"
call will return the same token.
An internal buffer is used to ensure that every token is lexed only once.
skip_optional
Takes a single argument indicating a token to match, as with Text::TokenStream::Token#matches
. If there are no more tokens in the input, or the next token doesn't match the argument, returns false; otherwise, advances past the next token, and returns true.
token_err
Takes a token as an argument, followed by multiple arguments that are concatenated into an error message. (If no non-token arguments are supplied, acts as if you'd supplied the string "Something's wrong"
.) Throws an exception, reporting the locus of the error as the position of the token (using 1-based line and column numbers).
AUTHOR
Aaron Crane, <arc@cpan.org>
COPYRIGHT
Copyright 2021 Aaron Crane.
LICENCE
This library is free software and may be distributed under the same terms as perl itself. See http://dev.perl.org/licenses/.