TITLE

Parrot Assembly Language

VERSION

CURRENT

Maintainer: Dan Sugalski
Class: Internals
PDD Number: 6
Version: 1.6
Status: Developing
Last Modified: 05 November 2001
PDD Format: 1
Language:English

HISTORY

Version 1.6

November 05, 2001

Version 1.5

October 12, 2001

Version 1.4

September 24, 2001

Version 1.3

September 12, 2001

Version 1.2

August 25, 2001

Version 1.1

August 8, 2001

version 1

None. First version

CHANGES

Version 1.6
  • Added GC opcodes

Version 1.5
  • Now have a bsr in addition to a jsr

  • return is now ret

  • Added save and restore ops for saving and restoring individual registers

Version 1.4
  • Conditional branches have just a true destination now

  • Added the I/O ops

  • Added in the threading ops

  • Added in the interpreter ops

Version 1.3
  • Added in the low-level module loading ops

  • Added in transcendental functions and modulo

  • Finished the pad/global variable fetching bits

Version 1.2

We have an interpreter now! Yay! (Okay, a simple one, but still...) Changes made to reflect that.

Version 1.1
  • Added in object

  • Changed remnants of "perl" to "Parrot"

  • Branch destination may be integer constant

  • Added "Assembly Syntax" section

Version 1.0

None. First version

ABSTRACT

This PDD describes the format of Parrot's bytecode assembly language.

DESCRIPTION

Parrot's bytecode can be thought of as a form of machine language for a virtual super CISC machine. It makes sense, then, to define an assembly language for it for those people who may need to generate bytecode directly, rather than indirectly via the perl (or any other) language.

IMPLEMENTATION

Parrot opcodes take the format of:

code destination[dest_key], source1[source1_key], source2[source2_key]

The brackets do not denote optional arguments as such--they are real brackets. They may be left out entirely, however. If any argument has a key the assembler will substitute the null key for arguments missing keys.

Conditional branches take the format:

code boolean[bool_key], true_dest

The key parameters are optional, and may be either an integer or a string. If either is passed they are associated with the parameter to their left, and are assumed to be either an array/list entry number, or a hash key. Any time a source or destination can be a PMC register, there may be a key.

Destinations for conditional branches are an integer offset from the current PC.

All registers have a type prefix of P, S, I, or N, for PMC, string, integer, and number respectively.

Assembly Syntax

All assembly opcodes contain only ASCII lowercase letters and the underscore.

Upper case names are reserved for assembler directives.

Labels all end with a colon. They may have ASCII letters, numbers, and underscores in them. Labels that begin with a dollar sign (the only valid spot in a label a dollar sign can appear) are private to the subroutine they appear in.

Namespaces are noted with the NAMESPACE directive. It takes a single parameter, the name of the namespace. Multilevel namespaces are supported, and the namespaces should be double-colon separated.

Subroutine names are noted with the SUB directive. It takes a single parameter, the name of the subroutine, which is added to the namespace's symbol table. Sub names may be any valid Unicode alphanumeric character and the underscore.

String and integer constants don't need to be put in a separate

OPCODE LIST

In the following list, there may be multiple (but unlisted) versions of an opcode. If an opcode takes a register that might be keyed, the keyed version of the opcode has a _k suffix. If an opcode might take multiple types of registers for a single parameter, the opcode function really has a _x suffix, where x is either P, S, I, or N, depending on whether a PMC, string, integer, or numeric register is involved. The suffix isn't necessary (though not an error) as the assembler can intuit the information from the code.

In those cases where an opcode can take several types of registers, and more than one of the sources or destinations are of variable type, then the register is passed in extended format. An extended format register number is of the form:

register_number | register_type

where register_type is 0x100, 0x200, 0x400, or 0x800 for PMC, string, integer, or number respectively. So N19 would be 0x413.

Note: Instructions tagged with a * will call a vtable method to handle the instruction if used on PMC registers.

In all cases, the letters x, y, and z refer to register numbers. The letter t refers to a generic register (P, S, I, or N). A lowercase p, s, i, or n means either a register or constant of the appropriate type (PMC, string, integer, or number)

Control flow

The control flow opcodes check conditions and manage program flow.

if tx, X

Check register tx. (Px, Sx, Ix, or Nx) If true, branch by X.

jump tx

Jump to the address held in register x (Px, Sx, or Ix).

branch tx

Branch forward or backward by the amount in register x. (X may be either Ix, Nx, or Px) Branch offset may also be an integer constant.

jsr tx

Jump to the location specified by register X. Push the current location onto the call stack for later returning.

bsr ix

Branch to the location specified by X (either register or label). Push the current location onto the call stack for later returning.

ret

Pop the location off the top of the stack and go there.

Data manipulation

These ops handle manipulating the data in registers

new Px, Iy

Create a new PMC of class y stored in PMC register x.

set tx, ty

Copies y into x. Note that strings and PMCs are referred to by pointer, so if you do something like:

set S0, S1

this will copy the pointer in S1 into S0, leaving both registers pointing at the same string.

clone Px, Py
clone Sx, xy

Performs a "deeper" copy of y into x, using the vtable appropriate to the class of Py if cloning a PMC.

tostring Sx, ty, Iz

Take the value in register y and convert it to a string of type z, storing the result in string register x.

add tx, ty, tz *

Add registers y and z and store the result in register x. (x = y + z) The registers must all be the same type, PMC, integer, or number.

sub tx, ty, tz *

Subtract register z from register y and store the result in register x. (x = y - z) The registers must all be the same type, PMC, integer, or number.

mul tx, ty, tz *

Multiply register y by register z and store the results in register x. The registers must be the same type.

div tx, ty, tz *

Divide register y by register z, and store the result in register x.

inc tx, nn *

Increment register x by nn. nn is an integer constant. If nn is omitted, increment is 1.

dec tx, nn *

Decrement register x by nn. nn is an integer constant. If nn is omitted, decrement by 1.

length Ix, Sy

Put the length of string y into integer register x.

concat Sx, Sy

Add string y to the end of string x.

repeat Sx, Sy, Iz

Copies string y z times into string x.

Transcendental operations

These opcodes handle the transcendental math functions. The destination register here must always be either a numeric or a PMC register.

sin nx, ty

Return the sine of the number in Y

cos nx, ty

Return the cosine of the number in Y

tan nx, ty

Return the tangent of the number in Y

sec nx, ty

Return the secant of the number in Y

atan nx, ty

Return the arctangent of Y

atan2 nx, ty

Return the result of atan2 of Y

asin nx, ty

Return the arcsine of y

acos nx, ty

Return the arccosine of y

asec nx, ty

Return the arcsecant of y

cosh nx, ty

Return the hyperbolic cosine of y

sinh nx, ty

Return the hyperbolic sine of y

tanh nx, ty

Return the hyperbolic tangent of y

sech nx, ty

Return the hyperbolic secant of y

log2 nx, ty

Return the base 2 log of y

log10 nx, ty

Return the base 10 log of y

ln Nx, ty

Return the base e log of y

log nx, ty, tz

Return the base Z log of Y

pow nx, ty, tz

Return Y to the Z power

exp nx, ty

Return e to the Y power

Register and stack ops

These opcodes deal with registers and stacks

push_p

Push the current frame of PMC registers onto their stack and start a new frame. The new registers are not initialized.

push_p_c

Push the current frame of PMC registers onto their stack and start a new frame. The new registers are copies of the previous frame.

pop_p

Pop the current frame of PMC registers off the stack.

push_i

The same as push_p, for the integer register set.

push_i_c

The same as push_p_c, for the integer register set.

pop_i

The same as pop_p, for the integer register set.

push_s

The same as push_p, for the string register set.

push_s_c

The same as push_p_c, for the string register set.

pop_s

The same as pop_p, for the string register set.

push_n

The same as push_p, for the floating-point register set.

push_n_c

The same as push_p_c, for the floating-point register set.

pop_n

The same as pop_p, for the floating-point register set.

save_i Ix

Push register X onto the generic stack

save_s Sx

Push register X onto the generic stack

save_p Px

Push register X onto the generic stack

save_n Nx

Push register X onto the generic stack

restore_i Ix

Restore register X from the generic stack

restore_s Ix

Restore register X from the generic stack

restore_p Px

Restore register X from the generic stack

restore_n Nx

Restore register X from the generic stack

entrytype Ix, iy

Put the type of stack entry Y into integer register X

set_warp string

Sets a named marker for the stacks for later use.

warp [string]

Reset the current register stacks to the state they were in when the warp was set. Resets only the frame pointers, doesn't guarantee the contents of the registers. Be very careful modifying the frame pointers by, for example, pushing register frames.

If a name is passed, warp back to the named point.

unwarp

Reset the current register stacks to the state they were in before the last warp.

Names, pads, and globals

These operations are responsible for finding names in lexical or global scopes, as well as storing data into those slots and checking constraints on those slots. They also allocate and deallocate scratchpads and entries in those pads.

Pad descriptors are templates for a particular pad. They are specified in the constant area of a bytecode file, and contain the names, types, and attributes for the variables referenced in the scope the pad is for.

The pad 0 is special, and represents the empty pad.

find_lex Px, sy

Find the lexical of name sy and store the PMC pointer in register Px.

find_global Px, sy, sz

Find the PMC for the global variable sy from the table sz and store it in register X

find_global Px, sy

Find the PMC for the global in the default table and put it in X.

find_global_table Px, sy

Find the global symbol table Y and store its PMC in X

find_global_slot ix, Py, sz

Find the slot in the global table Y for the global named Z, and store its slot in register X.

fetch_lex Px, iy, iz

Fetch the lexical in slot y of scratchpad z. If z is negative, search out from the current pad, if positive search inwards from the outermost pad. Put the resulting PMC pointer in register x

fetch_global Px, Py, iz

Fetch the global in slot Z of the symbol table pointed to by Y

store_global Px, sy

Store X in the default global symbol table with a name of Y.

newpad pad_descriptor

Create a new scratchpad using pad_descriptor as a template.

Exceptions

These opcodes deal with exception handling at the lowest level. Exception handlers are dynamically scoped, and any exception handler set in a scope will be removed when that scope is exited.

set_eh Px

Sets an exception handler in place. The code referred to by register Px will get called if an exception is thrown while the exception handler is in scope.

clear_eh

Clear out the most recently placed exception

throw Px

Throw an exception represented by the object in PMC register x.

rethrow Px

Only valid inside an exception handler. Rethrow the exception represented by the object in PMC register x. This object may have been altered by the exception handler.

Object things

These opcodes deal with PMCs as objects, rather than as opaque data items.

make_object Px, ty

Make the variable in PMC x an object of type ty. The type can be a string, in which case we treat it as a package name.

find_method Px, Py, tz

Find the method Z for object Y, and return a PMC for it in X.

call_method Px, ty
find_attribute Px, sy
set_attribute Px, ty, tz
can Px, ty
isa Px, ty

Module handling

These opcodes deal with loading in bytecode or executable code libraries, and fetching info about those libraries. This is all dealing with precompiled bytecode or shared libraries.

load_bytecode sx

Load in the bytecode in file X. Search the library path if need be.

load_opcode_lib sx, iy

Load in the opcode library X, starting at opcode number Y. Search the path if needed.

load_string_lib sx

Load in the string handling library named X

get_op_count sx

Return the number of opcodes in opcode library X

get_string_name sx

Get the name of the string encoding that the library X handles

find_string_lib sx, sy

Find the string library that handles strings of type Y. Return its name in X.

I/O operations

Reads and writes read and write records, for some value of record.

new_fh px

Create a new filehandle px

open px, sy

Open the file Y on filehandle X

read px, py, pz

Issue a read on the filehandle in y, and put the result in PMC X. PMC Z is the sync object.

write px, sy, pz

Write the string Y to filehandle X. PMC Z is the sync object.

wait px

Wait for the I/O operation represented by sync object X to finish

readw px, py

Read from filehandle Y and put the results in PMC X. Blocks until the read completes.

writew px, sy

Write string Y to filehandle X, waiting for the write to complete.

seek px, ty

Seek filehndle X to position Y.

tell tx, py

Return the current position of filehandle Y and put it in X. Returns -1 for filehandles where this can't be determined. (Such as stream connections)

status px, py, tz

Get informational item Z for filehandle Y and put the result in X. This fetches things like the number of entries in the IO pipe, number of outstanding I/O ops, number of ops on the filehandle, and so forth.

Threading ops

lock Px

Take out a high-level lock on the PMC in register X

unlock Px

Unlock the PMC in register X

pushunlock Px

Push an unlock request on the stack

Interpreter ops

newinterp Px, flags

Create a new interpreter in X, using the passed flags.

runinterp Px, iy

Jump into interpreter X and run the code starting at offset Y from the current location. (This is temporary until we get something better)

callout Pw, Px, sy, pz

Call routine Y in interpreter x, passing it the list of parameters Z. V is a synchronization object returned. It can be waited on like the sync objects returned from async I/O routines.

interpinfo Ix, iy

Get information item Y and put it in register X. Currently defined are:

1 TOTAL_MEM_ALLOC

The total amount of system memory allocated for later parceling out to Buffers. Doesn't include any housekeeping memory, memory for Buffer or PMC structs, or things of that nature.

2 DOD_RUNS

The total number of dead object detection runs that have been made.

3 COLLECT_RUNS

The total number of memory collection runs that have been made.

4 ACTIVE_PMCS

The number of PMCs considered active. This means the DOD scan hasn't noted them as dead.

5 ACTIVE_BUFFERS

The number of Buffers (usually STRINGs but could be other things) considered active.

6 TOTAL_PMCS

The total number of PMCs the interpreter has available. Includes both active and free PMCs

7 TOTAL_BUFFERS

The total number of Buffer structs the interpreter has available.

8 HEADERS_ALLOC_SINCE_COLLECT

The number of new Buffer header block allocations that have been made since the last DOD run. (Buffers, when allocated, are allocated in chunks)

9 MEM_ALLOCS_SINCE_COLLECT

The number of times we've requested a block of memory from the system for allocation to Buffers since the last time we compacted the memory heap.

Garbage collection

sweep

Fire off a dead object sweep

collect

Fire off a garbage collection sweep

pausecollect

Pause the garbage collector. No collections will be done for this interpreter until the collector is unpaused.

resumecollect

Unpause the collector. This doesn't necessarily do a GC run, merely allows the interpreter to fire one off when it deems it necessary.

Key operations

Keys are used to get access to individual elements of an aggregate variable. This is done to allow for opaque, packed, and multidimensional aggregate types.

A key entry may be an integer, string, or PMC. Integers are used for array lookups, strings for hash lookups, and PMCs for either.

new_key Sx

Create a new key structure and put a pointer to it in register X.

clone_key Sx, ky

Make a copy of the key Y and put a pointer to it in register X. Y may be either an S register or a constant.

size_key Sx, iy

Make the key structure X large enough to hold Y key entries

key_size Ix, ky

Put the number of elements in key Y into integer register X.

toss_key Sx

Nuke key X. Throws the structure away and invalidates the register.

ke_type Ix, ky, iz

Put the type of key Y's entry Z in register X. Current values are 0, 1, and 2 for Integer, String, and PMC, respectively.

ke_value tx, ky, iz

Put the value from key Y, entry Z into register X.

chop_key Sx

Toss the topmost entry from key X.

inc_key Sx, iy

Increment entry Y of key X by one.

set_key Sw, [isp]x, iy[, iz]

Set key W, offset Y, to value X. If X is a PMC, then the fourth operand must be specified. It can have a value of 0, 1, or 2, corresponding to integer, string, or object. Aggregates use this to figure out how to treat the key entry.

Symbolic support for HLLs

setline ix

Sets the 'current line' marker.

setfile sx

Sets the 'current file' marker.

setpackage sx

Sets the 'current package' marker.

getline ix

Fetches the 'current line' marker.

getfile sx

Fetches the 'current file' marker.

getpackage sx

Fetches the 'current package' marker.

ATTACHMENTS

REFERENCES