Subroutines
A calculation like "the factorial of a number" may be used several times in a large program. Subroutines allow this kind of functionality to be abstracted into a unit. It's a benefit for code reuse and maintainability. Even though PASM is just an assembly language for a virtual processor, it has a number of features to support high-level subroutine calls. PIR offers a smoother interface to those features.
PIR provides several different sets of syntax for subroutine calls. This is a language designed to implement other languages, and every language does subroutine calls a little differently. What's needed is a set of building blocks and tools, not a single prepackaged solution.
Parrot Calling Conventions
As we mentioned in the previous chapter, Parrot defines a set of calling conventions for externally visible subroutines. In these calls, the caller is responsible for preserving its own registers, and arguments and return values are passed in a predefined set of Parrot registers. The calling conventions use the Continuation Passing Style to pass control to subroutines and back again.
Subroutines
The fact that the Parrot calling conventions are clearly defined also makes it possible to provide some higher-level syntax for it. Manually setting up all the registers for each subroutine call isn't just tedious, it's also prone to bugs introduced by typos. PIR's simplest subroutine call syntax looks much like a high-level language. This example calls the subroutine _fact
with two arguments and assigns the result to $I0
:
($I0, $I1) = _fact(count, product)
This simple statement hides a great deal of complexity. It generates a subroutine object and stores it in P0
. It assigns the arguments to the appropriate registers, assigning any extra arguments to the overflow array in P3
. It also sets up the other registers to mark whether this is a prototyped call and how many arguments it passes of each type. It calls the subroutine stored in P0
, saving and restoring the top half of all register frames around the call. And finally, it assigns the result of the call to the given temporary register variables (for a single result you can drop the parentheses). If the one line above were written out in basic PIR it would be something like:
newsub P0, .Sub, _fact
I5 = count
I6 = product
I0 = 1
I1 = 2
I2 = 0
I3 = 0
I4 = 0
savetop
invokecc
restoretop
$I0 = I5
$I1 = I6
The PIR code actually generates an invokecc
opcode internally. It not only invokes the subroutine in P0
, but also generates a new return continuation in P1
. The called subroutine invokes this continuation to return control to the caller.
Expanded Subroutine Syntax
The single line subroutine call is incredibly convenient, but it isn't always flexible enough. So PIR also has a more verbose call syntax that is still more convenient than manual calls. This example pulls the subroutine _fact
out of the global symbol table and calls it:
find_global $P1, "_fact"
.begin_call
.arg count
.arg product
.call $P1
.result $I0
.end_call
The whole chunk of code from .begin_call
to .end_call
acts as a single unit. The .begin_call
directive can be marked as prototyped
or unprototyped
, which corresponds to the flag I0
in the calling conventions. The .arg
directive sets up arguments to the call. The .call
directive saves top register frames, calls the subroutine, and restores the top registers. The .result
directive retrieves return values from the call.
Subroutine Declarations
In addition to syntax for subroutine calls, PIR provides syntax for subroutine definitions. The .param
directive pulls parameters out of the registers and creates local named variables for them:
.param int c
The .begin_return
and .end_return
directives act as a unit much like the .begin_call
and .end_call
directives:
.begin_return
.return p
.end_return
The .return
directive sets up return values in the appropriate registers. After all the registers are set up the unit invokes the return continuation in P1
to return control to the caller.
Here's a complete code example that reimplements the factorial code from the previous section as an independent subroutine. The subroutine _fact
is a separate compilation unit, assembled and processed after the _main
function. Parrot resolves global symbols like the _fact
label between different units.
# factorial.pir
.sub _main
.local int count
.local int product
count = 5
product = 1
$I0 = _fact(count, product)
print $I0
print "\n"
end
.end
.sub _fact
.param int c
.param int p
loop:
if c <= 1 goto fin
p = c * p
dec c
branch loop
fin:
.begin_return
.return p
.end_return
.end
This example defines two local named variables, count
and product
, and assigns them the values 1 and 5. It calls the _fact
subroutine passing the two variables as arguments. In the call, the two arguments are assigned to consecutive integer registers, because they're stored in typed integer variables. The _fact
subroutine uses .param
and the return directives for retrieving parameters and returning results. The final printed result is 120.
You may want to generate a PASM source file for the above example to look at the details of how the PIR code translates to PASM:
$ parrot -o- factorial.pir
Continuation Passing Style
Continuations are snapshots, a frozen image of the current execution state of the VM. Once we have a continuation, we can invoke it to return to the point where the continuation was first created. It's like a magical timewarp that allows the developer to arbitrarily move control flow back to any previous point in the program, if they want there's actually no magic involved, just a lot of interesting ideas and involved code
.
Continuations are not a new concept, they've been boggling the minds of Lisp and Scheme programmers for many years. However, despite all their power and flexibility, they haven't been well-utilized in most modern programming languages or in their underlying libraries and virtual machines. Parrot aims to change that: In Parrot, almost every control flow manipulation, including all subroutine, method, and coroutine calls, are performed using continuations. This mechanism is mostly hidden from developers who build applications on top of Parrot. The power and flexibility is available if people want to use it, but it's hidden behind more familiar constucts if not.
Doing all sorts of flow control using continuations is called Continuation Passing Style (CPS). CPS allows parrot to offer all sorts of neat features, such as tail-call optimizations and lexical subroutines.
Creating and Using Continuations
Lexical Subroutines
As we've mentioned above, Parrot offers support for lexical subroutines. What this means is that we can define a subroutine by name inside a larger subroutine, and our "inner" subroutine is only visible and callable from the "outer" subroutine. Plus, the "inner" subroutine inherits all the lexical variables from the outer subroutine, but is able to define it's own lexical variables that cannot be seen or modified by the outer subroutine.
Scope and HLLs
Let us diverge for a minute and start looking forward at the idea of High Level Languages (HLLs) such as Perl, Python, and Ruby. All of these languages allow nested scopes, or blocks within blocks that can have their own lexical variables. Let's look back at the C programming language, where this kind of construct is not uncommon:
{
int x = 0;
int y = 1;
{
int z = 2;
// x, y, and z are all visible here
}
// only x and y are visible here
{
The code above illustrates this idea perfectly without having to get into a detailed and convoluted example: In the inner block, we define the variable z
which is only visible inside that block. The outer block has no knowledge of z
at all. However, the inner block does have access to the variables x
and y
.
PIR Scoping
In PIR, there is only one structure that supports scoping like this: the subroutine and objects that inherit from subroutines, such as methods, coroutines, and multisubs, which we will discuss later. There are no blocks in PIR that have their own scope, we have only subroutines. Fortunately, we can use lexical subroutines to simulate this behavior that HLLs require:
.sub 'MyOuter'
.lex int x
.lex int y
'MyInner'()
# only x and y are visible here
.end
.sub 'MyInner' :outer('MyOuter')
.lex int z
#x, y, and z are all visible here
.end
Declaring and Using Nested Subroutines
As we have seen above, we can declare a new subroutine to be a nested inner subroutine of an existing outer subroutine using the :outer
flag. The outer flag is used to specify the name of the outer subroutine. Where there may be multiple subroutines with the same name such is the case with multisubs, which we will discuss soon, we can use the :lexid
flag on the outer subroutine to give it a different--and unique--name that the lexical subroutines can reference in their :outer
declarations. Within lexical subroutines, the .lex
command defines a local variable that follows these scoping rules.
Sound confusing? It's not so bad. The basics are that we use :outer
to define a lexically-scoped subroutine, and we use .lex
to define lexically scoped variables in those subroutines. It's only when things start getting crazy with multisubs that we need to worry about any more details then that.
Compilation Units Revisited
The example above could have been written using simple labels instead of separate compilation units:
.sub _main
$I1 = 5 # counter
call fact # same as bsr fact
print $I0
print "\n"
$I1 = 6 # counter
call fact
print $I0
print "\n"
end
fact:
$I0 = 1 # product
L1:
$I0 = $I0 * $I1
dec $I1
if $I1 > 0 goto L1
ret
.end
The unit of code from the fact
label definition to ret
is a reusable routine. There are several problems with this simple approach. First, the caller has to know to pass the argument to fact
in $I1
and to get the result from $I0
. Second, neither the caller nor the function itself preserves any registers. This is fine for the example above, because very few registers are used. But if this same bit of code were buried deeply in a math routine package, you would have a high risk of clobbering the caller's register values.
Another disadvantage of this approach is that _main
and fact
share the same compilation unit, so they're parsed and processed as one piece of code. When Parrot does register allocation, it calculates the data flow graph (DFG) of all symbols,The operation to calculate the DFG has a quadratic cost or better. It depends on n_lines * n_symbols. looks at their usage, calculates the interference between all possible combinations of symbols, and then assigns a Parrot register to each symbol. This process is less efficient for large compilation units than it is for several small ones, so it's better to keep the code modular. The optimizer will decide whether register usage is light enough to merit combining two compilation units, or even inlining the entire function.
PASM Subroutines
PIR code can include pure PASM compilation units. These are wrapped in the .emit
and .eom
directives instead of .sub
and .end
. The .emit
directive doesn't take a name, it only acts as a container for the PASM code. These primitive compilation units can be useful for grouping PASM functions or function wrappers. Subroutine entry labels inside .emit
blocks have to be global labels:
.emit
_substr:
...
ret
_grep:
...
ret
.eom
Methods
PIR provides syntax to simplify writing methods and method calls for object-oriented programming. These calls follow the Parrot calling conventions as well. First we want to discuss namespaces in Parrot.
Namespaces
Namespaces provide a mechanism where names can be reused. This may not sound like much, but in large complicated systems, or systems with many included libraries, it can become a big hassle very quickly. Each namespace get's it's own area for function names and global variables. This way, you can have multiple functions named create
or new
or convert
, for instance, without having to use Multi-Method Dispatch (MMD), which we will describe later.
Namespaces are specified with the .namespace []
directive. The brackets are themselves not optional, but the keys inside them are. Here are some examples:
.namespace [ ] # The root namespace
.namespace [ "Foo" ] # The namespace "Foo"
.namespace [ "Foo" ; "Bar" ] # Namespace Foo::Bar
.namespace # WRONG! The [] are needed
Using semicolons, namespaces can be nested to any arbitrary depth. Namespaces are special types of PMC, so we can access them and manipulate them just like other data objects. We can get the PMC for the root namespace using the get_root_namespace
opcode:
$P0 = get_root_namespace
The current namespace, which might be different from the root namespace can be retrieved with the get_namespace
opcode:
$P0 = get_namespace # get current namespace
$P0 = get_namespace [ "Foo" ] # get PMC for namespace "Foo"
Once we have a namespace PMC, we can call functions in it, or retrieve global variables from it using the following functions:
$P1 = get_global $S0 # Get global in current namespace
$P1 = get_global [ "Foo" ], $S0 # Get global in namespace "Foo"
$P1 = get_global $P0, $S0 # Get global in $P0 namespace PMC
In the examples above, of course, $S0
contains the string name of the global variable or function from the namespace to find.
Method Syntax
Now that we've discussed namespaces, we can start to discuss object-oriented programming and method calls. The basic syntax is similar to the single line subroutine call above, but instead of a subroutine label name it takes a variable for the invocant PMC and a string with the name of the method:
object."methodname"(arguments)
The invocant can be a variable or register, and the method name can be a literal string, string variable, or method object register. This tiny bit of code sets up all the registers for a method call and makes the call, saving and restoring the top half of the register frames around the call. Internally, the call is a callmethodcc
opcode, so it also generates a return continuation.
This example defines two methods in the Foo
class. It calls one from the main body of the subroutine and the other from within the first method:
.sub _main
.local pmc class
.local pmc obj
newclass class, "Foo" # create a new Foo class
new obj, "Foo" # instantiate a Foo object
obj."_meth"() # call obj."_meth" which is actually
print "done\n" # "_meth" in the "Foo" namespace
end
.end
.namespace [ "Foo" ] # start namespace "Foo"
.sub _meth :method # define Foo::_meth global
print "in meth\n"
$S0 = "_other_meth" # method names can be in a register too
self.$S0() # self is the invocant
.end
.sub _other_meth :method # define another method
print "in other_meth\n" # as above Parrot provides a return
.end # statement
Each method call looks up the method name in the symbol table of the object's class. Like .pccsub
in PASM, .sub
makes a symbol table entry for the subroutine in the current namespace.
When a .sub
is declared as a method
, it automatically creates a local variable named self
and assigns it the object passed in P2
.
You can pass multiple arguments to a method and retrieve multiple return values just like a single line subroutine call:
(res1, res2) = obj."method"(arg1, arg2)
VTABLEs
PMCs all subscribe to a common interface of functions called VTABLEs. Every PMC implements the same set of these interfaces, which perform very specific low level tasks on the PMC. The term VTABLE was originally a shortened form of the name "virtual function table", although that name isn't used any more by the developers, or in any of the documentation. The virtual functions in the VTABLE, called VTABLE interfaces, are similar to ordinary functions and methods in many respects. VTABLE interfaces are occasionally called "VTABLE functions", or "VTABLE methods" or even "VTABLE entries" in casual conversation. A quick comparison shows that VTABLE interfaces are not really subroutines or methods in the way that those terms have been used throughout the rest of Parrot. Like methods on an object, VTABLE interfaces are defined for a specific class of PMC, and can be invoked on any member of that class. Likewise, in a VTABLE interface declaration, the self
keyword is used to describe the object that it is invoked upon. That's where the similarities end, however. Unlike ordinary subroutines or methods, VTABLE methods cannot be invoked directly, they are not inherited through class hierarchies like how methods are.
VTABLE interfaces are used, for instance, to set and retrieve data from the PMCs, to invoke the PMC (if it's a subroutine) or to perform low-level arithmetic operations on the PMC. VTABLE interfaces are not called directly from PIR code, but are instead called internally by Parrot to implement specific opcodes and behaviors. For instance, the invoke
opcode calls the invoke
VTABLE interface of the subroutine PMC, while the inc
opcode on a PMC calls the increment
VTABLE interface on that PMC. What VTABLE interface overrides do, in essence, is to allow the programmer to change the very way that Parrot accesses PMC data in the most fundamental way, and changes the very way that the opcodes act on that data.
PMCs, as we will look at more closely in later chapters, are typically implemented using PMC Script, a layer of syntax and macros over ordinary C code. A PMC compiler program converts the PMC files into C code for compilation as part of the ordinary build process. However, VTABLE interfaces can be written and overwritten in PIR using the :vtable
flag on a subroutine declaration. This technique is used most commonly when subclassing an existing PMC class in PIR code to create a new data type with custom access methods.
VTABLE interfaces are declared with the :vtable
flag:
.sub 'set_integer' :vtable
#set the integer value of the PMC here
.end
in which case the subroutine must have the same name as the VTABLE interface it is intended to implement. However, if you would like to name the function something different but still use it as a VTABLE interface, you could add an additional name parameter to the flag:
.sub 'MySetInteger' :vtable('set_integer')
#set the integer value of the PMC here
.end
VTABLE interfaces are often given the :method
flag also, so that they can be used directly in PIR code as methods, in addition to being used by Parrot as VTABLE interfaces. This means we can have the following:
.namespace [ "MyClass" ]
.sub 'ToString' :vtable('get_string') :method
$S0 = "hello!"
.return($S0)
.end
.namespace [ "OtherClass" ]
.local pmc myclass = new "MyClass"
say myclass # Say converts to string internally
S0 = myclass # Convert to a string, store in S0
S0 = myclass.'ToString'() # The same
Inside a VTABLE interface definition, the self
local variable contains the PMC on which the VTABLE interface is invoked.
Coroutines
Coroutines are similar to subroutines except that they have an internal notion of state. Coroutines, in addition to performing a normal .return
to return control flow back to the caller and destroy the lexical environment of the subroutine, may also perform a .yield
. .yield
returns a value to the caller like .return
can, but it does not destroy the lexical state of the coroutine. The next time the coroutine is called, it continues execution from the point of the last .yield
, not at the beginning of the coroutine.
Coroutines, in essence, allow the programmer to manually simulate multiple threads of concurrent execution, something that is normally handled automatically and invisible at the hardware level.
Defining Coroutines
Coroutines are defined like any ordinary subroutine. They do not require any special flag or any special syntax to mark them as being a coroutine. However, what sets them apart is the use of the .yield
directive. .yield
plays several roles:
=item* Identifies coroutines
When Parrot sees a yield, it knows to create a coroutine PMC object instead of a Subroutine one.
=item* Creates a continuation
Continuations, as we will see in more detail later, allow us to continue execution at the point of the continuation later. It's like a snapshot of the current execution environment. .yield
creates a continuation in the coroutine and stores the continuation object in the coroutine object or later resuming from the point of the .yield
.
=item* Returns a value
.yield
can return a value or many values, or no values to the caller It is basically the same as a .return
in this regard.
Multiple Dispatch
Multiple dispatch is when there are multiple subroutines in a single namespace with the same name. These functions must differ, however, in their parameter list, or "signature". All subs with the same name get put into a single PMC called a MultiSub. The MultiSub is like a list of subroutines. When the multisub is invoked, the MultiSub PMC object searches through the list of subroutines and searches for the one with the closest matching signature. The best match is the sub that gets invoked.
Defining MultiSubs
MultiSubs are subroutines with the :multi
flag applied to them. MultiSubs (also called "Multis") must all differ from one another in the number and/or type of arguments passed to the function. Having two multisubs with the same function signature could result in a parsing error, or the later function could overwrite the former one in the multi.
Multisubs are defined like this:
.sub 'MyMulti' :multi
# does whatever a MyMulti does
.end
Multis belong to a specific namespace. Functions in different namespaces with the same name do not conflict with each other this is one of the reasons for having multisubs in the first place!. It's only when multiple functions in a single namespace need to have the same name that a multi is used.
13 POD Errors
The following errors were encountered while parsing the POD:
- Around line 3:
A non-empty Z<>
- Around line 7:
A non-empty Z<>
- Around line 25:
A non-empty Z<>
- Around line 243:
Deleting unknown formatting code N<>
- Around line 263:
Deleting unknown formatting code N<>
- Around line 280:
A non-empty Z<>
- Around line 314:
Deleting unknown formatting code N<>
- Around line 353:
A non-empty Z<>
- Around line 375:
A non-empty Z<>
- Around line 388:
A non-empty Z<>
- Around line 434:
A non-empty Z<>
- Around line 608:
Deleting unknown formatting code N<>
- Around line 639:
Deleting unknown formatting code N<>