NAME

MarpaX::ESLIF::Recognizer - MarpaX::ESLIF's recognizer

VERSION

version 6.0.29

SYNOPSIS

my $eslifRecognizer = MarpaX::ESLIF::Recognizer->new($eslifGrammar, $recognizerInterface);

The recognizer interface is used to read chunks of data, that the internal recognizer will keep in its internal buffers, until it is consumed. The recognizer internal buffer may not be an exact duplicate of the external data that was read: in case of a character stream, the external data is systematically converted to UTF-8 sequence of bytes. If the user is pushing alternatives, he will have to know how many bytes this represent: native number of bytes

DESCRIPTION

MarpaX::ESLIF::Recognizer is a possible step after a MarpaX::ESLIF::Grammar instance is created.

METHODS

MarpaX::ESLIF::Recognizer->new($eslifGrammar, $recognizerInterface)

my $eslifRecognizer = MarpaX::ESLIF::Recognizer->new($eslifGrammar, $recognizerInterface);

Returns a recognizer instance, noted $eslifRecognizer later. Parameters are:

$eslifGrammar

MarpaX::ESLIF:Grammar object instance. Required.

$recognizerInterface

An object implementing MarpaX::ESLIF::Recognizer::Interface methods. Required.

$eslifRecognizer->newFrom($eslifGrammar)

my $eslifRecognizerNewFom = $eslifRecognizer->newFrom($eslifGrammar);

Returns a recognizer instance that is sharing the interface of $eslifRecognizer, but applied to the other grammar $eslifGrammar. It is functionally equivalent to:

#
# $eslifRecognizerInterface is the interface instance used to create $eslifRecognizer
#
my $eslifRecognizerNewFom = MarpaX::ESLIF::Recognizer->new($eslifGrammar, $eslifRecognizerInterface);
#
# Say ESLIF that $eslifRecognizer's stream is shared with $eslifRecognizerNewFom's stream
#
$eslifRecognizerNewFom->share($eslifRecognizer);
#
# Do some work
# ...
#
# Say ESLIF that sharing is finished
#
$eslifRecognizerNewFom->unshare();

$eslifRecognizer->set_exhausted_flag($flag)

$eslifRecognizer->set_exhausted_flag($flag);

Changes the isWithExhaustion() flag associated with the $eslifRecognizer recognizer instance.

$eslifRecognizer->share($eslifRecognizerShared)

$eslifRecognizer->share($eslifRecognizerShared);

Shares the stream of $eslifRecognizerShared recognizer instance with the $eslifRecognizer instance.

$eslifRecognizer->unshare()

$eslifRecognizer->unshare();

Unshares the stream of $eslifRecognizer instance. This is equivalent to:

$eslifRecognizer->share(undef);

$eslifRecognizer->peek($eslifRecognizerPeeked)

$eslifRecognizer->peek($eslifRecognizerPeeked);

Peeks the stream of $eslifRecognizerPeeked recognizer instance with the $eslifRecognizer instance. This mean that internal buffer of both recognizers will grow until the stream is unpeeked, as if ESLIF was processing a lexeme.

$eslifRecognizer->unpeek()

$eslifRecognizer->unpeek();

Unpeeks the stream of $eslifRecognizer instance. This is equivalent to:

$eslifRecognizer->peek(undef);

$eslifRecognizer->isCanContinue()

Returns a true value if recognizing can continue.

$eslifRecognizer->isExhausted()

Returns a true value if parse is exhausted, always set even if there is no exhaustion event.

$eslifRecognizer->scan($initialEvents)

Start a recognizer scanning. This call is allowed once in recognizer lifetime. If specified, $initialEvents must be a scalar. Default value is 0.

This method can generate events. Initial events are those that are happening at the very first step, and can be only prediction events. This may be annoying, and most applications do not want that - but some can use this to get the control before the first data read.

Returns a boolean indicating if the call was successful or not.

$eslifRecognizer->resume($deltaLength)

This method tell the recognizer to continue. Events can be generate after resume completion.

$deltaLength is optional and is a number of bytes to skip forward before resume goes on, must be positive or greater than 0. In case of a character stream, user will have to compute the number of bytes as if the input was in the UTF-8 encoding. Default value is 0.

Returns a boolean indicating if the call was successful or not.

$eslifRecognizer->events()

When control is given back to the end-user, he can always ask what are the current events.

Returns a reference to an array of hash references, eventually empty if there is none. Each array element is a reference to a hash containing these keys:

type

The type of event, that is one one the value listed in MarpaX::ESLIF::Event::Type.

symbol

The name of the symbol that triggered the event. Can be undef in case of exhaustion event.

event

The name of the event that triggered the event. Can be undef in case of exhaustion event.

$eslifRecognizer->eventOnOff($symbol, $eventTypes, $onOff)

Events can be switched on or off. For performance reasons, if you know that you do not need an event, it can be a good idea to switch if off. Required parameters are:

$symbol

The symbol name to which the event is associated.

$symbol

The symbol name to which the event is associated.

$eventTypes

A reference to an array of event types, as per MarpaX::ESLIF::Event::Type.

$onOff

A flag that set the event on or off.

Note that trying to change the state of an event that was not pre-declared in the grammar is a no-op.

Returns a reference to an array of hash references, eventually empty if there is none. Each array element is a reference to a hash containing these keys:

$eslifRecognizer->alternative($name, $anything, $grammarLength)

Pushes an alternative mean that the end-user is intructing the recognizer that, at this precise moment of lexing, there is a given symbol associated with the $name parameter, with a given opaque value <$anything>. Grammar length parameter $grammarLength is optional, and defaults to 1, i.e. one grammar token. Nevertheless it is possible to say that an alternative span over more than one symbol.

Returns a boolean indicating if the call was successful or not.

$eslifRecognizer->alternativeComplete($length)

Say the recognizer that alternatives are complete at this precise moment of parsing, and that the recognizer must move forward by $length bytes, which can be zero (end-user's responsibility). This method can generate events.

Returns a boolean indicating if the call was successful or not.

$eslifRecognizer->alternativeRead($name, $anything, $length, $grammarLength)

A short-hand version of alternative() followed by alternativeComplete(), with the same meaning for all parameters. This method can generate events.

Returns a boolean indicating if the call was successful or not.

$eslifRecognizer->nameTry($name)

The end-user can ask the recognizer if a symbol identified by $name may match.

Returns a boolean indicating if the lexeme is recognized.

$eslifRecognizer->discard()

Ask the recognizer to apply :discard.

Returns the number of bytes discarded.

$eslifRecognizer->discardTry()

The end-user can ask the recognizer if :discard rule may match.

Returns a boolean indicating if :discard is recognized.

$eslifRecognizer->nameExpected()

Ask the recognizer a list of expected symbol names.

Returns a reference to an array of names, eventually empty.

$eslifRecognizer->nameLastPause($name)

Ask the recognizer the end-user data associated to last symbol pause event. A pause event is the when the recognizer was responsible of symbol recognition, after a call to scan() or resume() methods. This data will be an exact copy of the last bytes that matched for a given symbol, where data is the internal representation of end-user data, meaning that it may be UTF-8 sequence of bytes in case of character stream.

Returns the associated bytes, or undef.

$eslifRecognizer->nameLastTry($name)

Ask the recognizer the end-user data associated to last successful symbol try. This data will be an exact copy of the last bytes that matched for a given symbol, where data is the internal representation of end-user data, meaning that it may be UTF-8 sequence of bytes in case of character stream.

Returns the associated bytes, or undef.

$eslifRecognizer->discardLastTry()

Ask the recognizer the end-user data associated to last successful discard try. This data will be an exact copy of the last bytes that matched for a given lexeme, where data is the internal representation of end-user data, meaning that it may be UTF-8 sequence of bytes in case of character stream.

Returns the associated bytes, or undef.

$eslifRecognizer->discardLast()

Ask the recognizer the end-user data associated to last successful discard. This data will be an exact copy of the last bytes that matched for the latest :discard rule, meaning that it may be UTF-8 sequence of bytes in case of character stream.

For performance reasons, last discard data is available only if the recognizer interface returned a true value for isWithTrack() method and if there is a discard event for the :discard rule that matched.

Returns the associated bytes, or undef.

$eslifRecognizer->isEof()

This method is similar to the isEof()'s recognizer interface. Except that this is asking the question directly to the recognizer's internal state, that maintains a copy of this flag.

Returns a boolean indicating of end-of-user-data is reached.

$eslifRecognizer->isStartComplete()

Returns a boolean indicating if start symbol completion is reached. Note that this does not mean that the grammar is exhausted.

$eslifRecognizer->read()

Forces the recognizer to read more data. Usually, the recognizer interface is called automatically whenever needed.

Returns a boolean value indicating success or not.

$eslifRecognizer->input([offset[, length]])

Get a copy of the current internal recognizer buffer, where offset and length are in byte unit and with the same semantics as builtin substr function for the offset parameter, same semantics for the length parameter as well when it is not 0 (the zero value is ignored). An undefined output does not necessarily mean there is an error, but that the internal buffer is completely consumed. It is recommended to set length parameter to a reasonable value, to prevent an internal copy of a potentially big number of bytes.

Default values for offset and <length> are 0.

Returns the associated input input, or undef.

$eslifRecognizer->inputLength()

Returns the length of current internal recognizer buffer, in bytes.

$eslifRecognizer->error()

Generates an error report for $eslifRecognizer.

$eslifRecognizer->progressLog($start, $end, $loggerLevel)

Asks to get a logging representation of the current parse progress. The format is fixed by the underlying libraries. The $start and $end parameters follow the perl convention of indices, i.e. when they are negative, start that far from the end. For example, -1 mean the last indice, -2 mean one before the last indice, etc... $loggerLevel is a level as per MarpaX::ESLIF::Logger::Level.

Nothing is returned.

$eslifRecognizer->progress($start, $end)

Asks to get the internal progress in terms of Earley parsing. The $start and $end parameters follow the perl convention of indices, i.e. when they are negative, start that far from the end. For example, -1 mean the last Earley Set Id, -2 mean one before the last Earley Set Id, etc...

Returns a reference to an array of hash references, eventually empty if there is none. Each array element is a reference to a hash containing these keys:

earleySetId

The Earley Set Id.

earleySetOrigId

The origin Earley Set Id.

rule

The rule number.

position

The position in the rule, where a negative number or a number bigger than the length of the rule means the rule is completed, 0 means the rule is predicted, else the rule is being run.

earleme

The Earleme Id corresponding to the Earley Set Id.

earlemeOrig

The origin Earleme Id corresponding to the origin Earley Set Id.

$eslifRecognizer->eventOnOff($symbol, $eventTypes, $onOff)

Events can be switched on or off. For performance reasons, if you know that you do not need an event, it can be a good idea to switch if off. Required parameters are:

$symbol

The symbol name to which the event is associated.

$symbol

The symbol name to which the event is associated.

$eventTypes

A reference to an array of event types, as per MarpaX::ESLIF::Event::Type.

$onOff

A flag that set the event on or off.

Note that trying to change the state of an event that was not pre-declared in the grammar is a no-op.

Returns a reference to an array of hash references, eventually empty if there is none. Each array element is a reference to a hash containing these keys:

$eslifRecognizer->lastCompletedOffset($name)

The recognizer is tentatively keeping an absolute offset every time a lexeme is complete. We say tentatively in the sense that no overflow checking is done, thus this number is not reliable in case the user data spanned over a very large number of bytes. In addition, the unit is in bytes. $name can be any symbol in the grammar.

Returns the absolute offset in bytes.

$eslifRecognizer->lastCompletedLength($name)

The recognizer is tentatively computing the length of every symbol completion. Since this value depend internally on the absolute previous offset, it is not guaranteed to be exact, in the sense that no overflow check is done. $name can be any symbol in the grammar.

Returns the absolute length in bytes.

$eslifRecognizer->lastCompletedLocation($name)

Returns an array containing at indices 0 and 1 the values of $eslifRecognizer-lastCompletedOffset($name)> and $eslifRecognizer-lastCompletedLength($name)>, respectively.

$eslifRecognizer->line()

If, at creation, the recognizer interface returned a true value for the $recognizerInterface-isWithNewline()> method, then the recognizer will track the number of lines for ever character-oriented chunk of data.

Returns the line number, or 0.

$eslifRecognizer->column()

If, at creation, the recognizer interface returned a true value for the $recognizerInterface-isWithNewline()> method, then the recognizer will track the number of columns for ever character-oriented chunk of data.

Returns the column number, or 0.

$eslifRecognizer->location()

Returns an array containing at indices 0 and 1 the values of $eslifRecognizer-line()> and $eslifRecognizer-column()>, respectively.

$eslifRecognizer->hookDiscard($discardOnOff)

Hook the recognizer to enable or disable the use of :discard if it exists. Default mode is on. This is a permanent setting.

$eslifRecognizer->hookDiscardSwitch()

Hook the recognizer to switch the use of :discard if it exists. This is a permanent setting.

$eslifRecognizer->symbolTry($symbol)

Tries to match the external symbol $symbol, that is an instance of MarpaX::ESLIF::Symbol. Return the match or undef.

DEPRECATED METHODS

$eslifRecognizer->lexemeAlternative($name, $anything, $grammarLength)

Alias to alternative.

$eslifRecognizer->lexemeComplete($length)

Alias to alternativeComplete.

$eslifRecognizer->lexemeRead($name, $anything, $length, $grammarLength)

Alias to alternativeRead.

$eslifRecognizer->lexemeTry($name)

Alias to nameTry.

$eslifRecognizer->lexemeExpected()

Alias to nameExpected.

$eslifRecognizer->lexemeLastPause($name)

Alias to nameLastPause.

$eslifRecognizer->lexemeLastTry($name)

Alias to nameLastTry.

SEE ALSO

MarpaX::ESLIF::Recognizer::Interface, MarpaX::ESLIF::Event::Type, MarpaX::ESLIF::Logger::Level, MarpaX::ESLIF::Symbol

NOTES

MarpaX::ESLIF::Recognizer cannot be reused across threads.

AUTHOR

Jean-Damien Durand <jeandamiendurand@free.fr>

COPYRIGHT AND LICENSE

This software is copyright (c) 2017 by Jean-Damien Durand.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.