Subroutines

A calculation like "the factorial of a number" may be used several times in a large program. Subroutines allow this kind of functionality to be abstracted into a unit. It's a benefit for code reuse and maintainability. Even though PASM is just an assembly language for a virtual processor, it has a number of features to support high-level subroutine calls. PIR offers a smoother interface to those features.

PIR provides several different sets of syntax for subroutine calls. This is a language designed to implement other languages, and every language does subroutine calls a little differently. What's needed is a set of building blocks and tools, not a single prepackaged solution.

Parrot Calling Conventions

As we mentioned in the previous chapter, Parrot defines a set of calling conventions for externally visible subroutines. In these calls, the caller is responsible for preserving its own registers, and arguments and return values are passed in a predefined set of Parrot registers. The calling conventions use the Continuation Passing Style to pass control to subroutines and back again.

Subroutines

The fact that the Parrot calling conventions are clearly defined also makes it possible to provide some higher-level syntax for it. Manually setting up all the registers for each subroutine call isn't just tedious, it's also prone to bugs introduced by typos. PIR's simplest subroutine call syntax looks much like a high-level language. This example calls the subroutine _fact with two arguments and assigns the result to $I0:

($I0, $I1) = _fact(count, product)

This simple statement hides a great deal of complexity. It generates a subroutine object and stores it in P0. It assigns the arguments to the appropriate registers, assigning any extra arguments to the overflow array in P3. It also sets up the other registers to mark whether this is a prototyped call and how many arguments it passes of each type. It calls the subroutine stored in P0, saving and restoring the top half of all register frames around the call. And finally, it assigns the result of the call to the given temporary register variables (for a single result you can drop the parentheses). If the one line above were written out in basic PIR it would be something like:

newsub P0, .Sub, _fact
I5 = count
I6 = product
I0 = 1
I1 = 2
I2 = 0
I3 = 0
I4 = 0
savetop
invokecc
restoretop
$I0 = I5
$I1 = I6

The PIR code actually generates an invokecc opcode internally. It not only invokes the subroutine in P0, but also generates a new return continuation in P1. The called subroutine invokes this continuation to return control to the caller.

Expanded Subroutine Syntax

The single line subroutine call is incredibly convenient, but it isn't always flexible enough. So PIR also has a more verbose call syntax that is still more convenient than manual calls. This example pulls the subroutine _fact out of the global symbol table and calls it:

find_global $P1, "_fact"

.begin_call
  .arg count
  .arg product
  .call $P1
  .result $I0
.end_call

The whole chunk of code from .begin_call to .end_call acts as a single unit. The .begin_call directive can be marked as prototyped or unprototyped, which corresponds to the flag I0 in the calling conventions. The .arg directive sets up arguments to the call. The .call directive saves top register frames, calls the subroutine, and restores the top registers. The .result directive retrieves return values from the call.

Subroutine Declarations

In addition to syntax for subroutine calls, PIR provides syntax for subroutine definitions. The .param directive pulls parameters out of the registers and creates local named variables for them:

.param int c

The .begin_return and .end_return directives act as a unit much like the .begin_call and .end_call directives:

.begin_return
  .return p
.end_return

The .return directive sets up return values in the appropriate registers. After all the registers are set up the unit invokes the return continuation in P1 to return control to the caller.

Here's a complete code example that reimplements the factorial code from the previous section as an independent subroutine. The subroutine _fact is a separate compilation unit, assembled and processed after the _main function. Parrot resolves global symbols like the _fact label between different units.

# factorial.pir
.sub _main
   .local int count
   .local int product
   count = 5
   product = 1

   $I0 = _fact(count, product)

   print $I0
   print "\n"
   end
.end

.sub _fact
   .param int c
   .param int p

loop:
   if c <= 1 goto fin
   p = c * p
   dec c
   branch loop
fin:
   .begin_return
   .return p
   .end_return
.end

This example defines two local named variables, count and product, and assigns them the values 1 and 5. It calls the _fact subroutine passing the two variables as arguments. In the call, the two arguments are assigned to consecutive integer registers, because they're stored in typed integer variables. The _fact subroutine uses .param and the return directives for retrieving parameters and returning results. The final printed result is 120.

You may want to generate a PASM source file for the above example to look at the details of how the PIR code translates to PASM:

$ parrot -o- factorial.pir

Continuation Passing Style

Continuations are snapshots, a frozen image of the current execution state of the VM. Once we have a continuation, we can invoke it to return to the point where the continuation was first created. It's like a magical timewarp that allows the developer to arbitrarily move control flow back to any previous point in the program, if they want there's actually no magic involved, just a lot of interesting ideas and involved code.

Continuations are not a new concept, they've been boggling the minds of Lisp and Scheme programmers for many years. However, despite all their power and flexibility, they haven't been well-utilized in most modern programming languages or in their underlying libraries and virtual machines. Parrot aims to change that: In Parrot, almost every control flow manipulation, including all subroutine, method, and coroutine calls, are performed using continuations. This mechanism is mostly hidden from developers who build applications on top of Parrot. The power and flexibility is available if people want to use it, but it's hidden behind more familiar constucts if not.

Doing all sorts of flow control using continuations is called Continuation Passing Style (CPS). CPS allows parrot to offer all sorts of neat features, such as tail-call optimizations and lexical subroutines.

Creating and Using Continuations

Lexical Subroutines

As we've mentioned above, Parrot offers support for lexical subroutines. What this means is that we can define a subroutine by name inside a larger subroutine, and our "inner" subroutine is only visible and callable from the "outer" subroutine. Plus, the "inner" subroutine inherits all the lexical variables from the outer subroutine, but is able to define it's own lexical variables that cannot be seen or modified by the outer subroutine.

Scope and HLLs

Let us diverge for a minute and start looking forward at the idea of High Level Languages (HLLs) such as Perl, Python, and Ruby. All of these languages allow nested scopes, or blocks within blocks that can have their own lexical variables. Let's look back at the C programming language, where this kind of construct is not uncommon:

{
    int x = 0;
    int y = 1;
    {
        int z = 2;
        // x, y, and z are all visible here
    }
    // only x and y are visible here
{

The code above illustrates this idea perfectly without having to get into a detailed and convoluted example: In the inner block, we define the variable z which is only visible inside that block. The outer block has no knowledge of z at all. However, the inner block does have access to the variables x and y.

PIR Scoping

In PIR, there is only one structure that supports scoping like this: the subroutine and objects that inherit from subroutines, such as methods, coroutines, and multisubs, which we will discuss later. There are no blocks in PIR that have their own scope, we have only subroutines. Fortunately, we can use lexical subroutines to simulate this behavior that HLLs require:

.sub 'MyOuter'
    .lex int x
    .lex int y
    'MyInner'()
    # only x and y are visible here
.end

.sub 'MyInner' :outer('MyOuter')
    .lex int z
    #x, y, and z are all visible here
.end

Declaring and Using Nested Subroutines

As we have seen above, we can declare a new subroutine to be a nested inner subroutine of an existing outer subroutine using the :outer flag. The outer flag is used to specify the name of the outer subroutine. Where there may be multiple subroutines with the same name such is the case with multisubs, which we will discuss soon, we can use the :lexid flag on the outer subroutine to give it a different--and unique--name that the lexical subroutines can reference in their :outer declarations. Within lexical subroutines, the .lex command defines a local variable that follows these scoping rules.

Sound confusing? It's not so bad. The basics are that we use :outer to define a lexically-scoped subroutine, and we use .lex to define lexically scoped variables in those subroutines. It's only when things start getting crazy with multisubs that we need to worry about any more details then that.

Compilation Units Revisited

The example above could have been written using simple labels instead of separate compilation units:

.sub _main
    $I1 = 5         # counter
    call fact       # same as bsr fact
    print $I0
    print "\n"
    $I1 = 6         # counter
    call fact
    print $I0
    print "\n"
    end

fact:
    $I0 = 1           # product
L1:
    $I0 = $I0 * $I1
    dec $I1
    if $I1 > 0 goto L1
    ret
.end

The unit of code from the fact label definition to ret is a reusable routine. There are several problems with this simple approach. First, the caller has to know to pass the argument to fact in $I1 and to get the result from $I0. Second, neither the caller nor the function itself preserves any registers. This is fine for the example above, because very few registers are used. But if this same bit of code were buried deeply in a math routine package, you would have a high risk of clobbering the caller's register values.

Another disadvantage of this approach is that _main and fact share the same compilation unit, so they're parsed and processed as one piece of code. When Parrot does register allocation, it calculates the data flow graph (DFG) of all symbols,The operation to calculate the DFG has a quadratic cost or better. It depends on n_lines * n_symbols. looks at their usage, calculates the interference between all possible combinations of symbols, and then assigns a Parrot register to each symbol. This process is less efficient for large compilation units than it is for several small ones, so it's better to keep the code modular. The optimizer will decide whether register usage is light enough to merit combining two compilation units, or even inlining the entire function.

PASM Subroutines

PIR code can include pure PASM compilation units. These are wrapped in the .emit and .eom directives instead of .sub and .end. The .emit directive doesn't take a name, it only acts as a container for the PASM code. These primitive compilation units can be useful for grouping PASM functions or function wrappers. Subroutine entry labels inside .emit blocks have to be global labels:

.emit
_substr:
    ...
    ret
_grep:
    ...
    ret
.eom

Methods

PIR provides syntax to simplify writing methods and method calls for object-oriented programming. These calls follow the Parrot calling conventions as well. First we want to discuss namespaces in Parrot.

Namespaces

Namespaces provide a mechanism where names can be reused. This may not sound like much, but in large complicated systems, or systems with many included libraries, it can become a big hassle very quickly. Each namespace get's it's own area for function names and global variables. This way, you can have multiple functions named create or new or convert, for instance, without having to use Multi-Method Dispatch (MMD), which we will describe later.

Namespaces are specified with the .namespace [] directive. The brackets are themselves not optional, but the keys inside them are. Here are some examples:

.namespace [ ]               # The root namespace
.namespace [ "Foo" ]         # The namespace "Foo"
.namespace [ "Foo" ; "Bar" ] # Namespace Foo::Bar
.namespace                   # WRONG! The [] are needed

Using semicolons, namespaces can be nested to any arbitrary depth. Namespaces are special types of PMC, so we can access them and manipulate them just like other data objects. We can get the PMC for the root namespace using the get_root_namespace opcode:

$P0 = get_root_namespace

The current namespace, which might be different from the root namespace can be retrieved with the get_namespace opcode:

$P0 = get_namespace             # get current namespace
$P0 = get_namespace [ "Foo" ]   # get PMC for namespace "Foo"

Once we have a namespace PMC, we can call functions in it, or retrieve global variables from it using the following functions:

$P1 = get_global $S0            # Get global in current namespace
$P1 = get_global [ "Foo" ], $S0 # Get global in namespace "Foo"
$P1 = get_global $P0, $S0       # Get global in $P0 namespace PMC

In the examples above, of course, $S0 contains the string name of the global variable or function from the namespace to find.

Method Syntax

Now that we've discussed namespaces, we can start to discuss object-oriented programming and method calls. The basic syntax is similar to the single line subroutine call above, but instead of a subroutine label name it takes a variable for the invocant PMC and a string with the name of the method:

object."methodname"(arguments)

The invocant can be a variable or register, and the method name can be a literal string, string variable, or method object register. This tiny bit of code sets up all the registers for a method call and makes the call, saving and restoring the top half of the register frames around the call. Internally, the call is a callmethodcc opcode, so it also generates a return continuation.

This example defines two methods in the Foo class. It calls one from the main body of the subroutine and the other from within the first method:

.sub _main
  .local pmc class
  .local pmc obj
  newclass class, "Foo"       # create a new Foo class
  new obj, "Foo"              # instantiate a Foo object
  obj."_meth"()               # call obj."_meth" which is actually
  print "done\n"              # "_meth" in the "Foo" namespace
  end
.end

.namespace [ "Foo" ]          # start namespace "Foo"

.sub _meth :method            # define Foo::_meth global
   print "in meth\n"
   $S0 = "_other_meth"        # method names can be in a register too
   self.$S0()                 # self is the invocant
.end

.sub _other_meth :method      # define another method
   print "in other_meth\n"    # as above Parrot provides a return
.end                          # statement

Each method call looks up the method name in the symbol table of the object's class. Like .pccsub in PASM, .sub makes a symbol table entry for the subroutine in the current namespace.

When a .sub is declared as a method, it automatically creates a local variable named self and assigns it the object passed in P2.

You can pass multiple arguments to a method and retrieve multiple return values just like a single line subroutine call:

(res1, res2) = obj."method"(arg1, arg2)

VTABLEs

PMCs all subscribe to a common interface of functions called VTABLEs. Every PMC implements the same set of these interfaces, which perform very specific low level tasks on the PMC. The term VTABLE was originally a shortened form of the name "virtual function table", although that name isn't used any more by the developers, or in any of the documentation. The virtual functions in the VTABLE, called VTABLE interfaces, are similar to ordinary functions and methods in many respects. VTABLE interfaces are occasionally called "VTABLE functions", or "VTABLE methods" or even "VTABLE entries" in casual conversation. A quick comparison shows that VTABLE interfaces are not really subroutines or methods in the way that those terms have been used throughout the rest of Parrot. Like methods on an object, VTABLE interfaces are defined for a specific class of PMC, and can be invoked on any member of that class. Likewise, in a VTABLE interface declaration, the self keyword is used to describe the object that it is invoked upon. That's where the similarities end, however. Unlike ordinary subroutines or methods, VTABLE methods cannot be invoked directly, they are not inherited through class hierarchies like how methods are.

VTABLE interfaces are used, for instance, to set and retrieve data from the PMCs, to invoke the PMC (if it's a subroutine) or to perform low-level arithmetic operations on the PMC. VTABLE interfaces are not called directly from PIR code, but are instead called internally by Parrot to implement specific opcodes and behaviors. For instance, the invoke opcode calls the invoke VTABLE interface of the subroutine PMC, while the inc opcode on a PMC calls the increment VTABLE interface on that PMC. What VTABLE interface overrides do, in essence, is to allow the programmer to change the very way that Parrot accesses PMC data in the most fundamental way, and changes the very way that the opcodes act on that data.

PMCs, as we will look at more closely in later chapters, are typically implemented using PMC Script, a layer of syntax and macros over ordinary C code. A PMC compiler program converts the PMC files into C code for compilation as part of the ordinary build process. However, VTABLE interfaces can be written and overwritten in PIR using the :vtable flag on a subroutine declaration. This technique is used most commonly when subclassing an existing PMC class in PIR code to create a new data type with custom access methods.

VTABLE interfaces are declared with the :vtable flag:

.sub 'set_integer' :vtable
    #set the integer value of the PMC here
.end

in which case the subroutine must have the same name as the VTABLE interface it is intended to implement. However, if you would like to name the function something different but still use it as a VTABLE interface, you could add an additional name parameter to the flag:

.sub 'MySetInteger' :vtable('set_integer')
    #set the integer value of the PMC here
.end

VTABLE interfaces are often given the :method flag also, so that they can be used directly in PIR code as methods, in addition to being used by Parrot as VTABLE interfaces. This means we can have the following:

.namespace [ "MyClass" ]

.sub 'ToString' :vtable('get_string') :method
    $S0 = "hello!"
    .return($S0)
.end

.namespace [ "OtherClass" ]

.local pmc myclass = new "MyClass"
say myclass                # Say converts to string internally
S0 = myclass               # Convert to a string, store in S0
S0 = myclass.'ToString'()  # The same

Inside a VTABLE interface definition, the self local variable contains the PMC on which the VTABLE interface is invoked.

Coroutines

Coroutines are similar to subroutines except that they have an internal notion of state. Coroutines, in addition to performing a normal .return to return control flow back to the caller and destroy the lexical environment of the subroutine, may also perform a .yield. .yield returns a value to the caller like .return can, but it does not destroy the lexical state of the coroutine. The next time the coroutine is called, it continues execution from the point of the last .yield, not at the beginning of the coroutine.

Coroutines, in essence, allow the programmer to manually simulate multiple threads of concurrent execution, something that is normally handled automatically and invisible at the hardware level.

Defining Coroutines

Coroutines are defined like any ordinary subroutine. They do not require any special flag or any special syntax to mark them as being a coroutine. However, what sets them apart is the use of the .yield directive. .yield plays several roles:

    =item* Identifies coroutines

    When Parrot sees a yield, it knows to create a coroutine PMC object instead of a Subroutine one.

    =item* Creates a continuation

    Continuations, as we will see in more detail later, allow us to continue execution at the point of the continuation later. It's like a snapshot of the current execution environment. .yield creates a continuation in the coroutine and stores the continuation object in the coroutine object or later resuming from the point of the .yield.

    =item* Returns a value

    .yield can return a value or many values, or no values to the caller It is basically the same as a .return in this regard.

Multiple Dispatch

Multiple dispatch is when there are multiple subroutines in a single namespace with the same name. These functions must differ, however, in their parameter list, or "signature". All subs with the same name get put into a single PMC called a MultiSub. The MultiSub is like a list of subroutines. When the multisub is invoked, the MultiSub PMC object searches through the list of subroutines and searches for the one with the closest matching signature. The best match is the sub that gets invoked.

Defining MultiSubs

MultiSubs are subroutines with the :multi flag applied to them. MultiSubs (also called "Multis") must all differ from one another in the number and/or type of arguments passed to the function. Having two multisubs with the same function signature could result in a parsing error, or the later function could overwrite the former one in the multi.

Multisubs are defined like this:

.sub 'MyMulti' :multi
    # does whatever a MyMulti does
.end

Multis belong to a specific namespace. Functions in different namespaces with the same name do not conflict with each other this is one of the reasons for having multisubs in the first place!. It's only when multiple functions in a single namespace need to have the same name that a multi is used.

13 POD Errors

The following errors were encountered while parsing the POD:

Around line 3:

A non-empty Z<>

Around line 7:

A non-empty Z<>

Around line 25:

A non-empty Z<>

Around line 243:

Deleting unknown formatting code N<>

Around line 263:

Deleting unknown formatting code N<>

Around line 280:

A non-empty Z<>

Around line 314:

Deleting unknown formatting code N<>

Around line 353:

A non-empty Z<>

Around line 375:

A non-empty Z<>

Around line 388:

A non-empty Z<>

Around line 434:

A non-empty Z<>

Around line 608:

Deleting unknown formatting code N<>

Around line 639:

Deleting unknown formatting code N<>