The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Marpa::R3::External::Model - Details on input models

About this document

The alternative input models described in this document are an advanced technique. If you are starting out with Marpa, you probably want to ignore this document. If you are an experienced Marpa user, it is still safe to ignore this document, but you might find the possibilities it discusses interesting.

This document contains a taxonomy of alternative input models, detailing their behavior in terms of G1 location and earlemes. The primary purpose is to aid a programmer in designing new alternative input models. A secondary purpose is to clarify Marpa's behavior for all input models.

Understanding the settings of G1 location and of current, closest and furthest earleme is crucial to working with advanced input models. For this reason the next sections will go through the possibilities carefully. The presentation will start with the most traditional and restrictive models. It will proceed to less restrictive models.

This is not the document to which to turn first. Before reading this document, you should be familiar with Marpa::R3::External::Basic and Marpa::R3::External::Low.

For brevity, this document speaks only of the lexeme_alternative() method. Note that, unless otherwise stated, everything said in this document applies to the lexeme_alternative_literal() method as well.

This document describes only the behavior of successful method calls. For the behavior on failure, see Marpa::R3::External::Low.

The initial earleme settings

All input models have the same initial values. Initially the current Earley set ID (G1 location) in always 0. Initially, the current, closest, and furthest earleme are always earleme 0.

The standard model of input

An application will rarely implement the standard model of input using the low level external scanning methods. The high level methods are much simpler and handier.

But if an application does implement the standard model of input using the low level methods, it does so by making calls to lexeme_alternative() and lexeme_complete() in pairs. It will make, first, exactly one call to lexeme_alternative(), and for a lexeme with length 1. Following the call to lexeme_alternative() must come the second call of the pair -- a call to lexeme_complete(). For an input of length n, there will be exactly n such paired calls.

Suppose, in the standard model that, for a successful call to lexeme_alternative(), the following is true:

  • The current earleme before the call is c.

Because this is the standard model, this means that we also know that

  • The current Earley set before the call is c.

  • The closest earleme before the call is c.

  • The furthest earleme before the call is c.

In that case, after the successful call to lexeme_alternative(), the following will be true:

  • The current earleme will still be c.

  • The current Earley set ID will still be c.

  • The closest earleme will be c+1.

  • The furthest earleme will be c+1.

Suppose, in the standard model that, before a successful call to lexeme_complete(), the following is true:

  • The current earleme before the call is c.

Because this is the standard model, this means that we also know that

  • The current Earley set ID before the call is c.

  • The pending lexeme queue contains exactly one pending lexeme set, a singleton set at earleme c+1.

  • The closest earleme before the call is c+1.

  • The furthest earleme before the call is c+1.

In that case, after the successful call to lexeme_complete(), the current, closest, and furthest earlemes will be the same. Specifically, the following will be true:

  • The current earleme will be advanced to c+1.

  • The current Earley set ID will be advanced to c+1.

  • The pending lexeme queue will be empty.

  • The closest earleme will remain at c+1.

  • The furthest earleme will remain at c+1.

Ambiguous input

As a first loosening of the standard model, we no longer require calls to lexeme_alternative() to be paired with calls to lexeme_complete(). Instead, we allow a series of one or more calls to lexeme_alternative() before each call to lexeme_complete().

We still require that there be at least one call to lexeme_alternative() before each call to lexeme_complete(). We also still require that all lexemes have a length of exactly 1.

In the ambiguous input model, the behavior of the current Earley set ID and of the current, closest, and furthest earlemes are exactly as described for the standard model, with these exceptions:

  • Where the current earleme is c, the value of the closest earleme before the second and subsequent successful calls of lexeme_alternative() will be c+1.

  • Where the current earleme is c, the value of the furthest earleme before the second and subsequent successful calls of lexeme_alternative() will be c+1.

  • The pending lexeme queue will still always contains exactly one pending lexeme set, and it will still be at earleme c+1. But after the n'th successful call to lexeme_alternative() it will contain n lexemes.

Variable length lexemes

Our final loosening of the restrictions is to allow variable length lexemes. That is, instead of requiring that all lexemes be of length 1, we allow lexemes to be of length 1 or longer.

General rule for pending lexemes

In this subsection we state a general rule for the pending lexeme queue, which applies at all times in all input models;

If the pending lexeme queue is empty the closest and furthest earlemes are equal to the current earleme.

If the pending lexeme queue is non-empty, let it be

lexset[1] ... lexset[n],

where n >= 1. Then the end earleme of the earlemes in lexset[1] is the closest earleme; and the end earleme of the earlemes in lexset[n] is the furthest earleme.

Call of lexeme_alternative()

Suppose, in the variable length lexeme model that, for a successful call to lexeme_alternative(), the following is true:

  • The current Earley set ID before the call is esid.

  • The current earleme before the call is c.

  • The length of the lexeme is length.

In that case, after a successful call to lexeme_alternative(), the following will be true:

  • The current Earley set ID will be esid. The current Earley set is never changed by a call to lexeme_alternative().

  • The current earleme will still be c. The current earleme is never changed by a call to lexeme_alternative().

  • The pending lexeme queue will be non-empty, and the closest and furthest earleme will obey the general rule.

  • The furthest earleme will be greater than or equal to c+length.

  • The closest earleme will be less than or equal to c+length.

Call of lexeme_complete()

Suppose, in the variable length lexeme model that, before a successful call to lexeme_complete(), the following is true:

  • The current Earley set ID before the call is esid.

  • The current earleme before the call is c.

  • The closest earleme before the call is old_closest.

In that case, after the successful call to lexeme_complete(), the following will be true:

  • The current Earley set ID will be esid + 1.

  • The current earleme will be old_closest.

  • The pending lexeme queue may or may not be empty, and the closest and furthest earleme will obey the general rule.

General rules for the earleme variables

The following will always be the case, no matter which of the input models is in use.

  • The current earleme is always greater than or equal to the current Earley set ID.

  • The closest earleme is always greater than or equal to the current earleme.

  • The furthest earleme is always greater than or equal to the closest earleme.

Copyright and License

Copyright 2018 Jeffrey Kegler
This file is part of Marpa::R3.  Marpa::R3 is free software: you can
redistribute it and/or modify it under the terms of the GNU Lesser
General Public License as published by the Free Software Foundation,
either version 3 of the License, or (at your option) any later version.

Marpa::R3 is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
Lesser General Public License for more details.

You should have received a copy of the GNU Lesser
General Public License along with Marpa::R3.  If not, see
http://www.gnu.org/licenses/.