The Parrot intermediate representation (PIR) is an overlay on top of Parrot assembly language, designed to make the developer's life easier. It has many high-level features that ease the pain of working with PASM code, but it still isn't a high-level language.
Internally, Parrot works a little differently with PASM and PIR source code, so each has different restrictions. The default is to run in a mixed mode that allows PASM code to combine with the higher-level syntax unique to PIR.
A file with a .pasm extension is treated as pure PASM code, as is any file run with the -a
command-line option. This mode is mainly used for running pure PASM tests. Parrot treats any extension other than .pasm as a PIR file. As a convention files containing PIR code generally have a .pir extension.
The documentation in imcc/docs/ or docs/ and the test suite in imcc/t are good starting points for digging deeper into the PIR syntax and functionality.
Statements
The syntax of statements in PIR is much more flexible than PASM. All PASM opcodes are valid PIR code, so the basic syntax is the same. The statement delimiter is a newline \n
, so each statement has to be on its own line. Any statement can start with a label. Comments are marked by a hash sign (#
) and PIR allows POD blocks.
But unlike PASM, PIR has some higher-level constructs, including symbol operators:
I1 = 5 # set I1, 5
named variables:
count = 5
and complex statements built from multiple keywords and symbol operators:
if I1 <= 5 goto LABEL # le I1, 5, LABEL
We'll get into these in more detail as we go.
Variables and Constants
Literal constants in PIR are the same as constants in PASM. Integers and floating-point numbers are numeric literals and strings are enclosed in quotes. PIR strings use the same escape sequences as PASM.
Parrot Registers
PIR code has a variety of ways to store values while you work with them. The most basic way is to use Parrot registers directly. PASM register names always start with a single character that shows whether it is an integer, numeric, string, or PMC register, and end with the number of the register (between 0 and 31):
S0 = "Hello, Polly.\n"
print S0
When you work directly with Parrot registers, you can only have 32 registers of any one type at a time.Only 31 for PMC registers, because P31
is reserved for spilling. If you have more than that, you have to start shuffling stored values on and off the user stack. You also have to manually track when it's safe to reuse a register. This kind of low-level access to the Parrot registers is handy when you need it, but it's pretty unwieldy for large sections of code.
Temporary Registers
PIR provides an easier way to work with Parrot registers. The temporary register variables are named like the PASM registers--with a single character for the type of register and a number--but they start with a $
character:
set $S42, "Hello, Polly.\n"
print $S42
The most obvious difference between Parrot registers and temporary register variables is that you have an unlimited number of temporaries. Parrot handles register allocation for you. It keeps track of how long a value in a Parrot register is needed and when that register can be reused.
The previous example used the $S42
temporary. When the code is compiled, that temporary is allocated to a Parrot register. As long as the temporary is needed, it is stored in the same register. When it's no longer needed, the Parrot register is re-allocated to some other value. This example uses two temporary string registers:
$S42 = "Hello, "
print $S42
$S43 = "Polly.\n"
print $S43
Since they don't overlap, Parrot allocates both to the S16
register. If you change the order a little so both temporaries are needed at the same time, they're allocated to different registers:
$S42 = "Hello, " # allocated to S17
$S43 = "Polly.\n" # allocated to S16
print $S42
print $S43
In this case, $S42
is allocated to S17
and $S43
is allocated to S16
.
Parrot allocates temporary variablesAs well as named variables, which we'll talk about next. to Parrot registers in ascending order of their score. The score is based on a number of factors related to variable usage. Variables used in a loop have a higher score than variables outside a loop. Variables that span a long range have a lower score than ones that are used only briefly.
If you want to peek behind the curtain and see how Parrot is allocating registers, you can run it with the -d
switch to turn on debugging output.
$ parrot -d1000 hello.pir
If hello.pir contains this code from the second example above (wrapped in a subroutine definition so it will compile):
.sub _main
$S42 = "Hello, " # allocated to S17
$S43 = "Polly.\n" # allocated to S16
print $S42
print $S43
end
.end
it produces this output:
code_size(ops) 11 oldsize 0
0 set_s_sc 17 1 set S17, "Hello, "
3 set_s_sc 16 0 set S16, "Polly.\n"
6 print_s 17 print S17
8 print_s 16 print S16
10 end end
Hello, Polly.
That's probably a lot more information than you wanted if you're just starting out. You can also generate a PASM file with the -o
switch and have a look at how the PIR code translates:
$ parrot -o hello.pasm hello.pir
or just
$ parrot -o- hello.pir
to see resulting PASM on stdout.
You'll find more details on these options and many others in CHP-11-SECT-4"Parrot Command-Line Options" in Chapter 11.
Named Variables
Named variables can be used anywhere a register or temporary register is used. They're declared with the .local
statement or the equivalent .sym
statement, which require a variable type and a name:
.local string hello
set hello, "Hello, Polly.\n"
print hello
This snippet defines a string variable named hello
, assigns it the value "Hello, Polly.\n", and then prints the value.
The valid types are int
, num
, string
, and pmc
or any Parrot class name (like PerlInt
or PerlString
). It should come as no surprise that these are the same divisions as Parrot's four register types. Named variables are valid from the point of their definition to the end of the compilation unit.
The name of a variable must be a valid PIR identifier. It can contain letters, digits, and underscores, but the first character has to be a letter or underscore. Identifiers don't have any limit on length yet, but it's a safe bet they will before the production release. Parrot opcode names are normally not allowed as variable names, though there are some exceptions.
PMC variables
PMC registers and variables act much like any integer, floating-point number, or string register or variable, but you have to instantiate a new PMC object before you use it. The new
instruction creates a new PMC. Unlike PASM, PIR doesn't use a dot in front of the class name.
P0 = new PerlString # same as new P0, .PerlString
P0 = "Hello, Polly.\n"
print P0
This example creates a PerlString
object, stores it in the PMC register P0
, assigns the value "Hello, Polly.\n" to it, and prints it. The syntax is exactly the same for temporary register variables:
$P4711 = new PerlString
$P4711 = "Hello, Polly.\n"
print $P4711
With named variables the type passed to the .local
directive is either the generic pmc
or a type compatible with the type passed to new
:
.local PerlString hello # or .local pmc hello
hello = new PerlString
hello = "Hello, Polly.\n"
print hello
Named Constants
The .const
directive declares a named constant. It's very similar to .local
, and requires a type and a name. The value of a constant must be assigned in the declaration statement. As with named variables, named constants are visible only within the compilation unit where they're declared. This example declares a named string constant hello
and prints the value:
.const string hello = "Hello, Polly.\n"
print hello
Named constants function in all the same places as literal constants, but have to be declared beforehand:
.const int the_answer = 42 # integer constant
.const string mouse = "Mouse" # string constant
.const num pi = 3.14159 # floating point constant
Register Spilling
As we mentioned earlier, Parrot allocates all temporary register variables and named variables to Parrot registers. When Parrot runs out of registers to allocate, it has to store some of the variables elsewhere. This is known as spilling. Parrot spills the variables with the lowest score and stores them in a PerlArray
object while they aren't used, then restores them to a register the next time they're needed. Consider an example that creates 33 integer variables, all containing values that are used later:
set $I1, 1
set $I2, 2
...
set $I33, 33
...
print $I1
print $I2
...
print $I33
Parrot allocates the 32 available integer registers to variables with a higher score and spills the variables with a lower score. In this example it picks $I1
and $I2
. Behind the scenes, Parrot generates code to store the values:
new P31, "PerlArray"
...
set I0, 1 # I0 allocated to $I1
set P31[0], I0 # spill $I1
set I0, 2 # I0 reallocated to $I2
set P31[1], I0 # spill $I2
It creates a PerlArray
object and stores it in register P31
.P31
is reserved for register spilling in PIR code, so generally it shouldn't be accessed directly. The set
instruction is the last time $I1
is used for a while, so immediately after that, Parrot stores its value in the spill array and frees up I0
to be reallocated.
Just before $I1
and $I2
are accessed to be printed, Parrot generates code to fetch the values from the spill array:
...
set I0, P31[0] # fetch $I1
print I0
You cannot rely on any particular register assignment for temporary variables or named variables. The register allocator does follow a set of precedence rules for allocation, but these rules may change. Also, if two variables have the same score Parrot may assign registers based on the hashed value of the variable name. Parrot randomizes the seed to the hash function to guarantee you never get a consistent order.
Symbol Operators
You probably noticed the =
assignment operator in some of the earlier examples:
$S2000 = "Hello, Polly.\n"
print $S2000
Standing alone, it's the same as the PASM set
opcode. In fact, if you run parrot in bytecode debugging mode (as in CHP-11-SECT-4.2"Assembler Options" in Chapter 11), you'll see it really is just a set
opcode underneath.
PIR has many other symbol operators: arithmetic, concatenation, comparison, bitwise, and logical. Many of these combine with assignment to produce the equivalent of a PASM opcode:
.local int sum
sum = $I42 + 5
print sum
print "\n"
The statement sum = $I42 + 5
translates to something like add I16, I17, 5
.
PIR also provides +=
, -=
, >>=
, ... that map to the two-argument forms like add I16, I17
.
Many PASM opcodes that return a single value also have an alternate syntax in PIR with the assignment operator:
$I0 = length str # length $I0, str
$I0 = isa PerlInt, "scalar" # isa $I0, PerlInt, "scalar"
$I0 = exists hash["key"] # exists $I0, hash["key"]
$N0 = sin $N1
$N0 = atan $N1, $N2
$S0 = repeat "x", 20
$P0 = newclass "Foo"
...
A complete list of PIR operators is available in CHP-11Chapter 11. We'll discuss the comparison operators in CHP-10-SECT-3"Symbol Operators" later in this chapter.
Labels
Like PASM, any line can start with a label definition like LABEL:
, but label definitions can also stand on their own line.
PIR code has both local and global labels. Global labels start with an underscore. The name of a global label has to be unique, since it can be called at any point in the program. Local labels start with a letter. A local label is accessible only in the compilation unit where it's defined.We'll discuss compilation units in the next section. The name has to be unique there, but it can be reused in a different compilation unit.
branch L1 # local label
bsr _L2 # global label
Labels are most often used in branching instructions and in subroutine calls.
Compilation Units
Compilation units in PIR are roughly equivalent to the subroutines or methods of a high-level language. Though they will be explained in more detail later, we introduce them here because all code in a PIR source file must be defined in a compilation unit. The simplest syntax for a PIR compilation unit starts with the .sub
directive and ends with the .end
directive:
.sub _main
print "Hello, Polly.\n"
end
.end
This example defines a compilation unit named _main
that prints a string. The name is actually a global label for this piece of code. If you generate a PASM file from the PIR code (see the end of the CHP-10-SECT-2.2"Temporary Registers" section earlier in this chapter), you'll see that the name translates to an ordinary label:
_main:
print "Hello, Polly.\n"
end
The first compilation unit in a file is normally executed first, but as in PASM you can flag any compilation unit as the first one to execute with the @MAIN
marker. The convention is to name the first compilation unit _main
, but the name isn't critical.
.sub _first
print "Polly want a cracker?\n"
end
.end
.sub _main @MAIN
print "Hello, Polly.\n"
end
.end
This code prints out "Hello, Polly." but not "Polly want a cracker?":
The CHP-10-SECT-6"Subroutines" section later in this chapter goes into much more detail about compilation units and their uses.
Flow Control
As in PASM, flow control in PIR is done entirely with conditional and unconditional branches. This may seem simplistic, but remember that PIR is a thin overlay on the assembly language of a virtual processor. For the average assembly language, jumps are the fundamental unit of flow control.
Any PASM branch instruction is valid, but PIR has some high-level constructs of its own. The most basic is the unconditional branch: goto
.
.sub _main
goto L1
print "never printed"
L1:
print "after branch\n"
end
.end
The first print
statement never runs because the goto
always skips over it to the label L1
.
The conditional branches combine if
or unless
with goto
.
.sub _main
$I0 = 42
if $I0 goto L1
print "never printed"
L1: print "after branch\n"
end
.end
In this example, the goto
branches to the label L1
only if the value stored in $I0
is true. The unless
statement is quite similar, but branches when the tested value is false. An undefined value, 0, or an empty string are all false values. The if ... goto
statement translates directly to the PASM if
, and unless
translates to the PASM unless
.
The comparison operators (<
, <=
, ==
, !=
, >
, >=
) can combine with if ... goto
. These branch when the comparison is true:
.sub _main
$I0 = 42
$I1 = 43
if $I0 < $I1 goto L1
print "never printed"
L1:
print "after branch\n"
end
.end
This example compares $I0
to $I1
and branches to the label L1
if $I0
is less than $I1
. The if $I0 < $I1 goto L1
statement translates directly to the PASM lt
branch operation.
The rest of the comparison operators are summarized in CHP-11-SECT-3"PIR Instructions" in Chapter 11.
PIR has no special loop constructs. A combination of conditional and unconditional branches handle iteration:
.sub _main
$I0 = 1 # product
$I1 = 5 # counter
REDO: # start of loop
$I0 = $I0 * $I1
dec $I1
if $I1 > 0 goto REDO # end of loop
print $I0
print "\n"
end
.end
This example calculates the factorial 5!
. Each time through the loop it multiplies $I0
by the current value of the counter $I1
, decrements the counter, and then branches to the start of the loop. The loop ends when $I1
counts down to 0 so that the if
doesn't branch to REDO
. This is a do while-style loop with the condition test at the end, so the code always runs the first time through.
For a while-style loop with the condition test at the start, use a conditional branch together with an unconditional branch:
.sub _main
$I0 = 1 # product
$I1 = 5 # counter
REDO: # start of loop
if $I1 <= 0 goto LAST
$I0 = $I0 * $I1
dec $I1
goto REDO
LAST: # end of loop
print $I0
print "\n"
end
.end
This example tests the counter $I1
at the start of the loop. At the end of the loop, it unconditionally branches back to the start of the loop and tests the condition again. The loop ends when the counter $I1
reaches 0 and the if
branches to the LAST
label. If the counter isn't a positive number before the loop, the loop never executes.
Any high-level flow control construct can be built from conditional and unconditional branches.
24 POD Errors
The following errors were encountered while parsing the POD:
- Around line 3:
Unknown directive: =head0
- Around line 5:
A non-empty Z<>
- Around line 33:
A non-empty Z<>
- Around line 61:
A non-empty Z<>
- Around line 72:
A non-empty Z<>
- Around line 83:
Deleting unknown formatting code N<>
- Around line 93:
A non-empty Z<>
- Around line 136:
Deleting unknown formatting code N<>
- Around line 182:
Deleting unknown formatting code A<>
- Around line 187:
A non-empty Z<>
- Around line 220:
A non-empty Z<>
- Around line 250:
A non-empty Z<>
- Around line 273:
A non-empty Z<>
- Around line 308:
Deleting unknown formatting code N<>
- Around line 331:
A non-empty Z<>
- Around line 340:
Deleting unknown formatting code A<>
- Around line 372:
Deleting unknown formatting code A<>
Deleting unknown formatting code A<>
- Around line 378:
A non-empty Z<>
- Around line 385:
Deleting unknown formatting code N<>
- Around line 401:
A non-empty Z<>
- Around line 417:
Deleting unknown formatting code A<>
- Around line 445:
Deleting unknown formatting code A<>
- Around line 450:
A non-empty Z<>
- Around line 513:
Deleting unknown formatting code A<>