Parrot Intermediate Representation
The Parrot intermediate representation (PIR) is an overlay on top of Parrot assembly language (PASM) that provides some simplifications and high-level constructs. It has many high-level features that ease the pain of working with PASM code, but it still isn't considered to be a high-level language by itself. PASM is discussed in more detail in CHP-5Chapter 5.
Internally Parrot works a little differently with PASM and PIR source code, so each has different restrictions. Parrot's default is to run in a mixed mode that allows PASM code to combine with the higher-level syntax unique to PIR. This gives the programmer flexibility to use aspects of each that are necessary.
A file with a .pasm extension is treated as pure PASM code by Parrot, as is any file run with the -a
command-line option. This mode is mainly used for running pure PASM tests. Parrot treats any extension other than .pasm as a PIR file in mixed mode. As a convention, files containing pure PIR code generally have a .pir extension.
PIR is well documented, both in traditional documentation but also in instructional code examples. The documentation for the PIR compiler IMCC in imcc/docs/ or the project documentation in docs/ are good sources for information. The test suite in imcc/t shows examples of proper working code as it should be. These are all good starting points for digging deeper into the PIR syntax and functionality.
Statements
The syntax of statements in PIR is much more flexible than PASM. All PASM opcodes are valid PIR code, so the basic syntax is the same. The statement delimiter is a newline \n
, so each statement has to be on its own line. Statements may also start with a label, for use with jumps and branches. Comments are marked by a hash sign (#
), and continue until the end of the line. PIR also allows POD blocks for multi-line documentation.
Unlike PASM, PIR has some higher-level constructs, including symbol operators:
I1 = 5 # set I1, 5
named variables:
count = 5
and complex statements built from multiple keywords and symbol operators:
if I1 <= 5 goto LABEL # le I1, 5, LABEL
We will get into all of these in more detail as we go.
Variables and Constants
Literal constants in PIR are the same as constants in PASM. Integers and floating-point numbers are numeric literals and strings are enclosed in quotes. PIR strings use the same escape sequences as PASM.
Parrot Registers
PIR code has a variety of ways to store values while you work with them. The most basic way is to use Parrot registers directly. PASM register names always start with a single character that shows whether it is an integer (I), numeric (N), string (S), or PMC (P) register, and end with the number of the register:
S0 = "Hello, Polly.\n"
print S0
You can have as many registers of each type as you need, Parrot will allocate new registers if you need more.Parrot registers are allocated in a linear array, and register numbers are indices into this array. Having more registers means Parrot must allocate more storage space for them, which can decrease memory efficency and register allocation/fetch performance. In general, it's better to keep the number of registers small, and to use registers with contiguous numbers to prevent growing the pool of allocated registers too large. Of course, as with any memory management situation, fewer allocations translates directly to improved performance.
Temporary Registers
PIR provides an easier way to work with Parrot registers. The temporary register variables are named like the PASM registers--with a single character for the type of register and a number--but they start with a $
character:
set $S42, "Hello, Polly.\n"
print $S42
The most obvious difference between Parrot registers ("P7") and temporary register variables ("$P7") is that you have an unlimited number of temporaries. Parrot handles register allocation automatically, and performs register reuse optimizations for you when it finds such situations.
The previous example used the $S42
temporary. When the code is compiled, that temporary is allocated to a Parrot register. As long as the temporary is needed, it is stored in the same register. When it's no longer needed, the Parrot register is re-allocated to some other value. This example uses two temporary string registers:
$S42 = "Hello, " # allocated to S16
print $S42
$S43 = "Polly.\n" # allocated to S16 again
print $S43
Since they don't overlap, Parrot can allocate both to a single register. If you change the order a little so both temporaries are needed at the same time, Parrot will allocate them to different registers instead:
$S42 = "Hello, " # allocated to S17
$S43 = "Polly.\n" # allocated to S16
print $S42
print $S43
In this case, $S42
is allocated to S17
and $S43
is allocated to S16
. These numbers are hypothetical, of course. Which registers Parrot actually uses in these situations is based on a large number of factors.
Parrot allocates temporary registersAs well as named variables, which we'll talk about next. to Parrot registers in ascending order based on their score. The score is used to determine whether a register is being actively used, and whether it can be reused for another purpose. Variables used in a loop have a higher score than variables outside a loop. Variables that span a long range have a lower score than ones that are used only briefly. Variables which have a low score (and thus are used less) are shuffled and reused for new temporaries.
If you want to peek behind the curtain and see how Parrot is allocating registers, you can run it with the -d
switch to turn on IMCC debugging output.
$ parrot -d1000 hello.pir
If hello.pir contains this code from the second example above (wrapped in a subroutine definition so it will compile):
.sub _main
$S42 = "Hello, " # allocated to S17
$S43 = "Polly.\n" # allocated to S16
print $S42
print $S43
end
.end
it produces this output:
code_size(ops) 11 oldsize 0
0 set_s_sc 17 1 set S17, "Hello, "
3 set_s_sc 16 0 set S16, "Polly.\n"
6 print_s 17 print S17
8 print_s 16 print S16
10 end end
Hello, Polly.
That's probably a lot more information than you wanted if you're just starting out. You can also generate a PASM file with the -o
switch and have a look at how the PIR code translates:
$ parrot -o hello.pasm hello.pir
or just
$ parrot -o- hello.pir
to see resulting PASM on stdout.
You'll find more details on these options and many others in CHP-11-SECT-4"Parrot Command-Line Options" in Chapter 11.
Named Variables
Named variables can be used anywhere a register or temporary register is used. They're declared with the .local
statement or the equivalent .sym
statement, which require a variable type and a name:
.local string hello
set hello, "Hello, Polly.\n"
print hello
This snippet defines a string variable named hello
, assigns it the value "Hello, Polly.\n", and then prints the value.
The valid types are int
, num
, string
, and pmc
or any Parrot class name (like PerlInt
or PerlString
). It should come as no surprise that these are the same divisions as Parrot's four register types. Named variables are valid from the point of their definition to the end of the compilation unit.
The name of a variable must be a valid PIR identifier. It can contain letters, digits, and underscores, but the first character has to be a letter or underscore. Identifiers don't have any limit on length yet, but it's a safe bet they will before the production release. Parrot opcode names are normally not allowed as variable names, though there are some exceptions.
PMC variables
PMC registers and variables act much like any integer, floating-point number, or string register or variable, but you have to instantiate a new PMC object before you use it. The new
instruction creates a new PMC. Unlike PASM, PIR doesn't use a dot in front of the class name.
P0 = new PerlString # same as new P0, .PerlString
P0 = "Hello, Polly.\n"
print P0
This example creates a PerlString
object, stores it in the PMC register P0
, assigns the value "Hello, Polly.\n" to it, and prints it. The syntax is exactly the same for temporary register variables:
$P4711 = new PerlString
$P4711 = "Hello, Polly.\n"
print $P4711
With named variables the type passed to the .local
directive is either the generic pmc
or a type compatible with the type passed to new
:
.local PerlString hello # or .local pmc hello
hello = new PerlString
hello = "Hello, Polly.\n"
print hello
Named Constants
The .const
directive declares a named constant. It's very similar to .local
, and requires a type and a name. The value of a constant must be assigned in the declaration statement. As with named variables, named constants are visible only within the compilation unit where they're declared. This example declares a named string constant hello
and prints the value:
.const string hello = "Hello, Polly.\n"
print hello
Named constants function in all the same places as literal constants, but have to be declared beforehand:
.const int the_answer = 42 # integer constant
.const string mouse = "Mouse" # string constant
.const num pi = 3.14159 # floating point constant
Register Spilling
As we mentioned earlier, Parrot allocates all temporary register variables and named variables to Parrot registers. When Parrot runs out of registers to allocate, it has to store some of the variables elsewhere. This is known as spilling. Parrot spills the variables with the lowest score and stores them in a PerlArray
object while they aren't used, then restores them to a register the next time they're needed. Consider an example that creates 33 integer variables, all containing values that are used later:
set $I1, 1
set $I2, 2
...
set $I33, 33
...
print $I1
print $I2
...
print $I33
Parrot allocates the 32 available integer registers to variables with a higher score and spills the variables with a lower score. In this example it picks $I1
and $I2
. Behind the scenes, Parrot generates code to store the values:
new P31, "PerlArray"
...
set I0, 1 # I0 allocated to $I1
set P31[0], I0 # spill $I1
set I0, 2 # I0 reallocated to $I2
set P31[1], I0 # spill $I2
It creates a PerlArray
object and stores it in register P31
.P31
is reserved for register spilling in PIR code, so generally it shouldn't be accessed directly. The set
instruction is the last time $I1
is used for a while, so immediately after that, Parrot stores its value in the spill array and frees up I0
to be reallocated.
Just before $I1
and $I2
are accessed to be printed, Parrot generates code to fetch the values from the spill array:
...
set I0, P31[0] # fetch $I1
print I0
You cannot rely on any particular register assignment for temporary variables or named variables. The register allocator does follow a set of precedence rules for allocation, but these rules may change. Also, if two variables have the same score Parrot may assign registers based on the hashed value of the variable name. Parrot randomizes the seed to the hash function to guarantee you never get a consistent order.
Symbol Operators
You probably noticed the =
assignment operator in some of the earlier examples:
$S2000 = "Hello, Polly.\n"
print $S2000
Standing alone, it's the same as the PASM set
opcode. In fact, if you run parrot in bytecode debugging mode (as in CHP-11-SECT-4.2"Assembler Options" in Chapter 11), you'll see it really is just a set
opcode underneath.
PIR has many other symbol operators: arithmetic, concatenation, comparison, bitwise, and logical. Many of these combine with assignment to produce the equivalent of a PASM opcode:
.local int sum
sum = $I42 + 5
print sum
print "\n"
The statement sum = $I42 + 5
translates to something like add I16, I17, 5
.
PIR also provides +=
, -=
, >>=
, ... that map to the two-argument forms like add I16, I17
.
Many PASM opcodes that return a single value also have an alternate syntax in PIR with the assignment operator:
$I0 = length str # length $I0, str
$I0 = isa PerlInt, "scalar" # isa $I0, PerlInt, "scalar"
$I0 = exists hash["key"] # exists $I0, hash["key"]
$N0 = sin $N1
$N0 = atan $N1, $N2
$S0 = repeat "x", 20
$P0 = newclass "Foo"
...
A complete list of PIR operators is available in CHP-11Chapter 11. We'll discuss the comparison operators in CHP-10-SECT-3"Symbol Operators" later in this chapter.
Labels
Like PASM, any line can start with a label definition like LABEL:
, but label definitions can also stand on their own line.
PIR code has both local and global labels. Global labels start with an underscore. The name of a global label has to be unique, since it can be called at any point in the program. Local labels start with a letter. A local label is accessible only in the compilation unit where it's defined.We'll discuss compilation units in the next section. The name has to be unique there, but it can be reused in a different compilation unit.
branch L1 # local label
bsr _L2 # global label
Labels are most often used in branching instructions and in subroutine calls.
Compilation Units
Compilation units in PIR are roughly equivalent to the subroutines or methods of a high-level language. Though they will be explained in more detail later, we introduce them here because all code in a PIR source file must be defined in a compilation unit. The simplest syntax for a PIR compilation unit starts with the .sub
directive and ends with the .end
directive:
.sub _main
print "Hello, Polly.\n"
end
.end
This example defines a compilation unit named _main
that prints a string. The name is actually a global label for this piece of code. If you generate a PASM file from the PIR code (see the end of the CHP-10-SECT-2.2"Temporary Registers" section earlier in this chapter), you'll see that the name translates to an ordinary label:
_main:
print "Hello, Polly.\n"
end
The first compilation unit in a file is normally executed first, but as in PASM you can flag any compilation unit as the first one to execute with the @MAIN
marker. The convention is to name the first compilation unit _main
, but the name isn't critical.
.sub _first
print "Polly want a cracker?\n"
end
.end
.sub _main @MAIN
print "Hello, Polly.\n"
end
.end
This code prints out "Hello, Polly." but not "Polly want a cracker?":
The CHP-10-SECT-6"Subroutines" section later in this chapter goes into much more detail about compilation units and their uses.
Flow Control
As in PASM, flow control in PIR is done entirely with conditional and unconditional branches. This may seem simplistic, but remember that PIR is a thin overlay on the assembly language of a virtual processor. For the average assembly language, jumps are the fundamental unit of flow control.
Any PASM branch instruction is valid, but PIR has some high-level constructs of its own. The most basic is the unconditional branch: goto
.
.sub _main
goto L1
print "never printed"
L1:
print "after branch\n"
end
.end
The first print
statement never runs because the goto
always skips over it to the label L1
.
The conditional branches combine if
or unless
with goto
.
.sub _main
$I0 = 42
if $I0 goto L1
print "never printed"
L1: print "after branch\n"
end
.end
In this example, the goto
branches to the label L1
only if the value stored in $I0
is true. The unless
statement is quite similar, but branches when the tested value is false. An undefined value, 0, or an empty string are all false values. The if ... goto
statement translates directly to the PASM if
, and unless
translates to the PASM unless
.
The comparison operators (<
, <=
, ==
, !=
, >
, >=
) can combine with if ... goto
. These branch when the comparison is true:
.sub _main
$I0 = 42
$I1 = 43
if $I0 < $I1 goto L1
print "never printed"
L1:
print "after branch\n"
end
.end
This example compares $I0
to $I1
and branches to the label L1
if $I0
is less than $I1
. The if $I0 < $I1 goto L1
statement translates directly to the PASM lt
branch operation.
The rest of the comparison operators are summarized in CHP-11-SECT-3"PIR Instructions" in Chapter 11.
PIR has no special loop constructs. A combination of conditional and unconditional branches handle iteration:
.sub _main
$I0 = 1 # product
$I1 = 5 # counter
REDO: # start of loop
$I0 = $I0 * $I1
dec $I1
if $I1 > 0 goto REDO # end of loop
print $I0
print "\n"
end
.end
This example calculates the factorial 5!
. Each time through the loop it multiplies $I0
by the current value of the counter $I1
, decrements the counter, and then branches to the start of the loop. The loop ends when $I1
counts down to 0 so that the if
doesn't branch to REDO
. This is a do while-style loop with the condition test at the end, so the code always runs the first time through.
For a while-style loop with the condition test at the start, use a conditional branch together with an unconditional branch:
.sub _main
$I0 = 1 # product
$I1 = 5 # counter
REDO: # start of loop
if $I1 <= 0 goto LAST
$I0 = $I0 * $I1
dec $I1
goto REDO
LAST: # end of loop
print $I0
print "\n"
end
.end
This example tests the counter $I1
at the start of the loop. At the end of the loop, it unconditionally branches back to the start of the loop and tests the condition again. The loop ends when the counter $I1
reaches 0 and the if
branches to the LAST
label. If the counter isn't a positive number before the loop, the loop never executes.
Any high-level flow control construct can be built from conditional and unconditional branches.
24 POD Errors
The following errors were encountered while parsing the POD:
- Around line 5:
A non-empty Z<>
- Around line 7:
Deleting unknown formatting code A<>
- Around line 39:
A non-empty Z<>
- Around line 69:
A non-empty Z<>
- Around line 80:
A non-empty Z<>
- Around line 91:
Deleting unknown formatting code N<>
- Around line 104:
A non-empty Z<>
- Around line 149:
Deleting unknown formatting code N<>
- Around line 198:
Deleting unknown formatting code A<>
- Around line 203:
A non-empty Z<>
- Around line 236:
A non-empty Z<>
- Around line 266:
A non-empty Z<>
- Around line 289:
A non-empty Z<>
- Around line 324:
Deleting unknown formatting code N<>
- Around line 347:
A non-empty Z<>
- Around line 356:
Deleting unknown formatting code A<>
- Around line 388:
Deleting unknown formatting code A<>
Deleting unknown formatting code A<>
- Around line 394:
A non-empty Z<>
- Around line 401:
Deleting unknown formatting code N<>
- Around line 417:
A non-empty Z<>
- Around line 433:
Deleting unknown formatting code A<>
- Around line 461:
Deleting unknown formatting code A<>
- Around line 466:
A non-empty Z<>
- Around line 529:
Deleting unknown formatting code A<>