NAME

MarpaX::ESLIF::Recognizer - MarpaX::ESLIF's recognizer

VERSION

version 3.0.29

SYNOPSIS

my $eslifRecognizer = MarpaX::ESLIF::Recognizer->new($eslifGrammar, $recognizerInterface);

The recognizer interface is used to read chunks of data, that the internal recognizer will keep in its internal buffers, until it is consumed. The recognizer internal buffer may not be an exact duplicate of the external data that was read: in case of a character stream, the external data is systematically converted to UTF-8 sequence of bytes. If the user is pushing alternatives, he will have to know how many bytes this represent: native number of bytes

DESCRIPTION

MarpaX::ESLIF::Recognizer is a possible step after a MarpaX::ESLIF::Grammar instance is created.

METHODS

MarpaX::ESLIF::Recognizer->new($eslifGrammar, $recognizerInterface)

my $eslifRecognizer = MarpaX::ESLIF::Recognizer->new($eslifGrammar, $recognizerInterface);

Returns a recognizer instance, noted $eslifRecognizer later. Parameters are:

$eslifGrammar

MarpaX::ESLIF:Grammar object instance. Required.

$recognizerInterface

An object implementing MarpaX::ESLIF::Recognizer::Interface methods. Required.

$eslifRecognizer->newFrom($eslifGrammar)

my $eslifRecognizerNewFom = $eslifRecognizer->newFrom($eslifGrammar);

Returns a recognizer instance that is sharing the stream of $eslifRecognizer, but applied to the other grammar $eslifGrammar.

$eslifRecognizer->set_exhausted_flag($flag)

$eslifRecognizer->set_exhausted_flag($flag);

Changes the isWithExhaustion() flag associated with the $eslifRecognizer recognizer instance.

$eslifRecognizer->share($eslifRecognizerShared)

$eslifRecognizer->share($eslifRecognizerShared);

Shares the stream of $eslifRecognizerShared recognizer instance with the $eslifRecognizer instance.

$eslifRecognizer->isCanContinue()

Returns a true value if recognizing can continue.

$eslifRecognizer->isExhausted()

Returns a true value if parse is exhausted, always set even if there is no exhaustion event.

$eslifRecognizer->scan($initialEvents)

Start a recognizer scanning. This call is allowed once in recognizer lifetime. If specified, $initialEvents must be a scalar. Default value is 0.

This method can generate events. Initial events are those that are happening at the very first step, and can be only prediction events. This may be annoying, and most applications do not want that - but some can use this to get the control before the first data read.

Returns a boolean indicating if the call was successful or not.

$eslifRecognizer->resume($deltaLength)

This method tell the recognizer to continue. Events can be generate after resume completion.

$deltaLength is optional and is a number of bytes to skip forward before resume goes on, must be positive or greater than 0. In case of a character stream, user will have to compute the number of bytes as if the input was in the UTF-8 encoding. Default value is 0.

Returns a boolean indicating if the call was successful or not.

$eslifRecognizer->events()

When control is given back to the end-user, he can always ask what are the current events.

Returns a reference to an array of hash references, eventually empty if there is none. Each array element is a reference to a hash containing these keys:

type

The type of event, that is one one the value listed in MarpaX::ESLIF::Event::Type.

symbol

The name of the symbol that triggered the event. Can be undef in case of exhaustion event.

event

The name of the event that triggered the event. Can be undef in case of exhaustion event.

$eslifRecognizer->eventOnOff($symbol, $eventTypes, $onOff)

Events can be switched on or off. For performance reasons, if you know that you do not need an event, it can be a good idea to switch if off. Required parameters are:

$symbol

The symbol name to which the event is associated.

$symbol

The symbol name to which the event is associated.

$eventTypes

A reference to an array of event types, as per MarpaX::ESLIF::Event::Type.

$onOff

A flag that set the event on or off.

Note that trying to change the state of an event that was not pre-declared in the grammar is a no-op.

Returns a reference to an array of hash references, eventually empty if there is none. Each array element is a reference to a hash containing these keys:

$eslifRecognizer->lexemeAlternative($name, $anything, $grammarLength)

A lexeme is a terminal in the legacy parsing terminology. The lexeme word mean that in the grammar it is associated to a sub-grammar. Pushing an alternative mean that the end-user is intructing the recognizer that, at this precise moment of lexing, there is a given lexeme associated with the $name parameter, with a given opaque value <$anything>. Grammar length parameter $grammarLength is optional, and defaults to 1, i.e. one lexeme (which is a symbol in the grammar) correspond to one token. Nevertheless it is possible to say that an alternative span over more than one symbol.

Returns a boolean indicating if the call was successful or not.

$eslifRecognizer->lexemeComplete($length)

Say the recognizer that alternatives are complete at this precise moment of parsing, and that the recognizer must move forward by $length bytes, which can be zero (end-user's responsibility). This method can generate events.

Returns a boolean indicating if the call was successful or not.

$eslifRecognizer->lexemeRead($name, $anything, $length, $grammarLength)

A short-hand version of lexemeAlternative() followed by lexemeComplete(), with the same meaning for all parameters. This method can generate events.

Returns a boolean indicating if the call was successful or not.

$eslifRecognizer->lexemeTry($name)

The end-user can ask the recognizer if a lexeme $name may match.

Returns a boolean indicating if the lexeme is recognized.

$eslifRecognizer->discardTry()

The end-user can ask the recognizer if :discard rule may match.

Returns a boolean indicating if :discard is recognized.

$eslifRecognizer->lexemeExpected()

Ask the recognizer a list of expected lexemes.

Returns a reference to an array of names, eventually empty.

$eslifRecognizer->lexemeLastPause($name)

Ask the recognizer the end-user data associated to last lexeme pause after event. A pause after event is the when the recognizer was responsible of lexeme recognition, after a call to scan() or resume() methods. This data will be an exact copy of the last bytes that matched for a given lexeme, where data is the internal representation of end-user data, meaning that it may be UTF-8 sequence of bytes in case of character stream.

Returns the associated bytes, or undef.

$eslifRecognizer->lexemeLastTry($name)

Ask the recognizer the end-user data associated to last successful lexeme try. This data will be an exact copy of the last bytes that matched for a given lexeme, where data is the internal representation of end-user data, meaning that it may be UTF-8 sequence of bytes in case of character stream.

Returns the associated bytes, or undef.

$eslifRecognizer->discardLastTry()

Ask the recognizer the end-user data associated to last successful discard try. This data will be an exact copy of the last bytes that matched for a given lexeme, where data is the internal representation of end-user data, meaning that it may be UTF-8 sequence of bytes in case of character stream.

Returns the associated bytes, or undef.

$eslifRecognizer->discardLast()

Ask the recognizer the end-user data associated to last successful discard. This data will be an exact copy of the last bytes that matched for the latest :discard rule, meaning that it may be UTF-8 sequence of bytes in case of character stream.

For performance reasons, last discard data is available only if the recognizer interface returned a true value for isWithTrack() method and if there is a discard event for the :discard rule that matched.

Returns the associated bytes, or undef.

$eslifRecognizer->isEof()

This method is similar to the isEof()'s recognizer interface. Except that this is asking the question directly to the recognizer's internal state, that maintains a copy of this flag.

Returns a boolean indicating of end-of-user-data is reached.

$eslifRecognizer->isExhausted()

This method returns a true value if the underlying grammar is exhausted, a false value otherwise, and croaks on failure.

Returns a boolean indicating of end-of-user-data is reached.

$eslifRecognizer->read()

Forces the recognizer to read more data. Usually, the recognizer interface is called automatically whenever needed.

Returns a boolean value indicating success or not.

$eslifRecognizer->input()

Get a copy of the current internal recognizer buffer, starting at the exact byte where resume() would start. An undefined output does not mean there is an error, but that internal buffers are completely consumed. ESLIF will automatically require more data unless the EOF flag is set. Internal buffer is always UTF-8 encoded to every chunk of data that was declared to be a character stream.

Returns the associated input bytes, or undef.

$eslifRecognizer->progressLog($start, $end, $loggerLevel)

Asks to get a logging representation of the current parse progress. The format is fixed by the underlying libraries. The $start and $end parameters follow the perl convention of indices, i.e. when they are negative, start that far from the end. For example, -1 mean the last indice, -2 mean one before the last indice, etc... $loggerLevel is a level as per MarpaX::ESLIF::Logger::Level.

Nothing is returned.

$eslifRecognizer->lastCompletedOffset($name)

The recognizer is tentatively keeping an absolute offset every time a lexeme is complete. We say tentatively in the sense that no overflow checking is done, thus this number is not reliable in case the user data spanned over a very large number of bytes. In addition, the unit is in bytes. $name can be any symbol in the grammar.

Returns the absolute offset in bytes.

$eslifRecognizer->lastCompletedLength($name)

The recognizer is tentatively computing the length of every symbol completion. Since this value depend internally on the absolute previous offset, it is not guaranteed to be exact, in the sense that no overflow check is done. $name can be any symbol in the grammar.

Returns the absolute length in bytes.

$eslifRecognizer->lastCompletedLocation($name)

Returns an array containing at indices 0 and 1 the values of $eslifRecognizer-lastCompletedOffset($name)> and $eslifRecognizer-lastCompletedLength($name)>, respectively.

$eslifRecognizer->line()

If, at creation, the recognizer interface returned a true value for the $recognizerInterface-isWithNewline()> method, then the recognizer will track the number of lines for ever character-oriented chunk of data.

Returns the line number, or 0.

$eslifRecognizer->column()

If, at creation, the recognizer interface returned a true value for the $recognizerInterface-isWithNewline()> method, then the recognizer will track the number of columns for ever character-oriented chunk of data.

Returns the column number, or 0.

$eslifRecognizer->location()

Returns an array containing at indices 0 and 1 the values of $eslifRecognizer-line()> and $eslifRecognizer-column()>, respectively.

$eslifRecognizer->hookDiscard($discardOnOff)

Hook the recognizer to enable or disable the use of :discard if it exists. Default mode is on. This is a permanent setting.

$eslifRecognizer->hookDiscardSwitch()

Hook the recognizer to switch the use of :discard if it exists. This is a permanent setting.

SEE ALSO

MarpaX::ESLIF::Recognizer::Interface, MarpaX::ESLIF::Event::Type, MarpaX::ESLIF::Logger::Level

AUTHOR

Jean-Damien Durand <jeandamiendurand@free.fr>

COPYRIGHT AND LICENSE

This software is copyright (c) 2017 by Jean-Damien Durand.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.