NAME
docs/pdds/pdd20_lexical_vars.pod - Lexical variables
VERSION
$Revision: 9764 $
ABSTRACT
This document defines the requirements and implmentation strategy for lexically scoped variables.
SYNOPSIS
.sub foo
.lex "$a", P0
P1 = new Integer
P1 = 13013
store_lex "$a", P1
print P0 # prints 13013
.end
.sub bar :outer(foo)
P0 = find_lex "$a" # may succeed; depends on closure creation
.end
.sub baz
P0 = find_lex "$a" # guaranteed to fail: no .lex, no :outer()
.end
.sub corge
print "hi"
.end # no .lex and no :lex, thus: no LexInfo, no LexPad
# Lexical behavior varies by HLL. For example,
# Tcl's lexicals are not declared at compile time.
.HLL "Tcl", "tcl_group"
.sub grault :lex # without ":lex", Tcl subs have no lexicals
P0 = fetch_lex "x" # FAILS
P0 = new Integer # really TclInteger
P0 = 42
store_lex "x", P0 # creates lexical "x"
P0 = fetch_lex "x" # SUCCEEDS
.end
DESCRIPTION
"Lexical scoping" (a.k.a. "static scoping") is a term with many explanations and examples across computer science. (And I'm sure I've only seen a fraction of them.)
For Parrot purposes, "lexical variables" are those stored in a hash (or hash-like) PMC associated with a subroutine invocation, a.k.a. call frame.
CONCEPTUAL MODEL
LexPad PMC
Lexicals are stored in PMCs called "LexPads". The LexPad interface is a slight extension of the Hash interface. Lexical variables are conceptually key/value pairs, with string keys and PMC values.
Each call frame may contain a LexPad, through which the names and values of that call's lexical variables can be accessed.
Call frames are not PMCs (yet?) and are therefore not visible to user code (yet?). Therefore, users can access the LexPad for a given call frame through any Continuation object that was created for the given frame.
However, normal lexical variable usage does not use the LexPad directly. Instead, specialize opcodes implement the common use cases. LexPads contain references to their associated LexInfos.
(Specialized opcodes are particular a Good Idea because most lexical usage involves searching more than one LexPad, so a single LexPad reference is not as useful as it might seem. And, of coures, opcodes can cheat ... er, can be written in optimized C. :-))
LexPad keys are unique. Therefore, in each subroutine, there can be only one lexical variable with a given name.
TODO: Describe how lexical naming system interacts with non-ASCII character sets.
LexInfo PMC
At compile time, each newly created Subroutine structure (or Subroutine derivative, e.g. Closure) is populated with a PMC of HLL-mapped type LexInfo. (Note that this type may actually be Null in some HLLs, e.g. Tcl.) LexInfo PMCs are the interface through which the PIR compiler communicates compile-time information about lexical variables.
A LexInfo represents what is known about lexicals at compile time (e.g. variable names, perhaps variable types, etc.), while a LexPad represents what becomes known at run time (values).
Lookup strategy
If Parrot is asked to access a lexical variable named $var, Parrot follows the following strategy. Note that fetch and store use the exact same approach.
Parrot starts with the currently executing subroutine $sub, then loops through these steps:
1. Starting at the current call frame, walk back until an active
frame is found that is executing $sub. Call it $frame.
(NOTE: The first time through, $sub is the current subroutine
and $frame is the currently live frame.)
2. Look for $var in $frame.lexicals using standard Hash methods.
3. If the given pad contains $var, fetch/store it and
REPORT SUCCESS.
4. Set $sub to $sub.outer. (That is, the textually enclosing
subroutine.) But if $sub has no outer sub, REPORT FAILURE.
LexPad and LexInfo are optional; the ":lex" attribute
Parrot does not assume that every subroutine needs lexical variables. Therefore, Parrot defaults to not creating LexInfo or LexPad PMCs for a given subroutine. It only creates them when it first encounters a ".lex" directive in the subroutine. If no such directive is found, Parrot does not create a LexInfo for it at compile time, nor a LexPad for it at run time.
However, an absence of ".lex" directives is normal for some languages (e.g. Tcl) which lack compile-time knowledge of lexicals. For these languages, the additional Subroutine attribute ":lex" should be specified. It tells Parrot to create LexInfo and LexPads even though no lexicals are declared.
Closures
FIXME: Describe the current closure mechanism
HLL Type Mapping
The implementation of lexical variables in the PIR compiler depends on two new PMCs: LexPad and LexInfo. However, the default Parrot LexPad and LexInfo PMCs will not meet the needs of all languages. They should suit Perl 6, for example, but not Tcl.
Therefore, it is expected that HLLs will map the LexPad and LexInfo types to something more appropriate (e.g. TclLexPad and TclLexInfo). That mapping will automatically occur when the appropriate ".HLL" directive is in force.
Using Tcl as an extreme example: TclLexPad will likely be a thin veneer on PMCHash. TclLexInfo will likely map to Null. Tcl provides no reliable compile-time information about lexicals; without any compile-time information to store, there's no need for TclLexInfo to do anything interesting.
Nested Subroutines Have Outies; the ":outer" attribute
For HLLs that support nested subroutines, Parrot provides a way to denote that a given subroutine is conceptually "inside" another. Lookup for lexical variables starts at the current call frame and proceeds through call frames that invoke "outer" subroutines. The specific meaning of "outer" is defined below, but it's designed to support the common linguistic structure of nested subroutines where inner subs refer to lexical variables contained in outer blocks.
Note that "outer" and "caller" are very different concepts! For example, given the Perl 6 code:
sub foo {
my $a = 1;
my sub a { eval '$a' }
return &a;
}
The &foo subroutiine is the outer subroutine of &a, but it is not the caller of &a.
In the above example, the definition of the Parrot subroutine implementing &a must include a notation that it is textually enclosed within &foo. This is a static attribute of a Subroutine, set at compile time and never changed thereafter. (Unless you're evil, or Damian. But I repeat myself.) This information is given through a ":outer()" subroutine attribute, e.g.:
.sub a :outer(foo)
LEXPAD AND LEXINFO REQUIRED INTERFACES
LexInfo
Below are the standard LexInfo methods that all HLL LexInfo PMCs may support. Each LexInfo PMC should only define the methods that it can usefully implement, so the compiler can use method lookup failure to generate useful diagnostics (e.g. "register aliasing not supported by Tcl lexicals").
Each language's LexInfo will implement methods that are helpful to that language's LexPad. In the extreme case, LexInfo can be Null -- but if it is, the given HLL should not generate any ".lex*" directives.
- void init_pmc(PMC *sub)
-
Called exactly once.
- PMC *sub()
-
Return the associated Subroutine.
- void declare_lex_preg(STRING *name, INTVAL preg)
-
Declare a lexical variable that is an alias for a PMC register. The PIR compiler calls this method in response to a
.lex STRING, PREG
directive. For example, given this preamble:.lex "$a", $P0 $P1 = new Integer
These two opcodes have the identical effect:
$P0 = $P1 store_lex "$a", $P1
And, also, these two opcodes also have identical effect:
$P1 = $P0 $P1 = find_lex "$a"
LexPad
LexPads start by implementing the Hash interface: variable names are string keys, and variable values are PMCs.
In addition, LexPads must implement the following methods:
- init_pmc(PMC *lexinfo)
-
Called exactly once. Note that Parrot guarantees that this method will be called after the new Context object is made current. It is recommended that any LexPad that aliases registers take a pointer to the current Context at init_pmc() time.
- PMC *lexinfo()
-
Return the associated LexInfo.
DEFAULT PARROT LEXPAD AND LEXINFO
The default LexInfo supports lexicals only as aliases for PMC registers. It therefore implements declare_lex_preg(). (Internally, it could be a Hash of some kind, where keys are String variable names and values are integer register numbers.)
The default LexPad (like all LexPads) implements the Hash interface. When asked to look up a variable, it finds the corresponding register number by querying its associated LexInfo. It then gets or sets the given numbered register in its associated Parrot Context structure.
ATTACHMENTS
None.
FOOTNOTES
None.
REFERENCES
None.