NAME
MarpaX::Repa::Lexer - simplify lexing for Marpa parser
DESCRIPTION
Most details are in MarpaX::Repa.
METHODS
new
Returns a new lexer instance. Takes named arguments.
my $lexer = MyLexer->new(
tokens => {
word => qr{\b\w+\b},
},
store => 'array',
recognizer => $recognizer,
debug => 1,
);
Possible arguments:
- tokens
-
Hash with names of terminals as keys and one of the following as values:
- string
-
Just a string to match.
'a token' => "matches this long string",
- regular expression
-
A
qr{}
compiled regexp.'a token' => qr{"[^"]+"},
Note that regexp MUST match at least one character. At this moment look behind to look at chars before the current position is not supported.
- hash
-
With hash you can define token specific options. At this moment 'store' option only (see below). Use
match
key to set what to match (string or regular expression).'a token' => { match => "a string", store => 'hash', },
- store
-
What to store (pass to Marpa's recognizer). The following variants are supported:
- hash (default)
-
{ token => 'a token', value => 'a value' }
- array
-
[ 'a token', 'a value' ]
- scalar
-
'a value'
- undef
-
undef is stored so later Repa's actions will skip it.
- a callback
-
A function will be called with token name and reference to its value. Should return a reference or undef that will be passed to recognizer.
- recognizer
-
Marpa::R2::Recognizer object or its subclass.
- debug
-
If true then lexer prints debug log to STDERR.
- min_buffer
-
Minimal size of the buffer (4*1024 by default).
init
Setups instance and returns $self
. Called from constructor.
recognize
Takes a file handle and parses it. Dies on critical errors, not when parser lost its way. Returns recognizer that was passed to "new".
buffer
Returns reference to the current buffer.
grow_buffer
Called when "buffer" needs a re-fill with a file handle as argument. Returns true if there is still data to come from the handle.
dump_buffer
Returns first 20 chars of the buffer with everything besides ASCII encoded with \x{####}
. Use argument to control size, zero to mean whole buffer.