TITLE
Parrot JIT Subsystem
VERSION
CURRENT
Maintainer: Daniel Grunblatt
Class: Internals
PDD Number: 8
Version: 1.1
Status: Developing
Last Modified: 31 January 2002
PDD Format: 1
Language:English
ABSTRACT
This PDD describes the Parrot Just In Time compilation subsystem.
DESCRIPTION
The Just In Time, or JIT, subsystem converts a bytecode file to native machine code instructions and executes the generated instruction sequence directly.
IMPLEMENTATION
Currently works on Intel x86 or ALPHA architectures running BSD or Linux.
The JIT gives the possibility to write Parrot opcodes in assembly.
FILES
- jit/${jitcpuarch}/core.jit
-
Most of the core parrot opcodes are (or will be) here written in assembly, the syntax is described later. When an opcode is not defined here, the code generated by the C compiler is called.
- jit/${jitcpuarch}/string.jit
-
The string subsystem.
- include/parrot/jit.h
-
There is an opcode_assembly_t for each parrot opcode holding the position independent code, the size of it, the number of arguments it needs, and one structure like this:
typedef struct { int amount; info_t info[MAX_SUBSTITUTION]; } substitution_t;
Where info_t is:
typedef struct { int position; int number; } info_t;
per
Identifier
, amount is the number of substitutions of this type the current opcode uses, per each we got the position in the PIC where it goes (relative to the start of this opcode) and the argument number that holds the value or address. - jit.c
-
Here is build_asm which, loops over the parrot bytecode and fills an array with the displacement from the start of the jitted code to each opcode and an array with absolute address, both at the same position that the opcode number has in the bytecode, this is:
Parrot bytecode: 73 0 3 72 1 0 Relative: 5 0 0 17 0 0 Absolute: 0x3a285f0 0 0 0x3a285fc 0 0
Then concatenate the PIC (Position Independent Code) of each Parrot opcode using the op_assembly_t structure, the how is described in the Format of .jit Files section below. And it replace in the Parrot bytecode cur_opcode[0] by the absolute address of the jitted op.
- Parrot/Jit/${jitcpuarch}Generic.pm
-
Should have the platform specific implementation of the methods required by jit2h.pl :
Assemble()
Takes assembly in the .jit files format and returns the object code with the
Identifiers
init()
Returns the object code that is need to start running the jitted code according to each platform calling convention.
call()
Takes the number of arguments and the arguments and generates the object code to call a C function. The caller must calculate the position of the address or displacement.
system_call()
Takes the system call number, the number of arguments and the arguments and generates the object code to make a system call.
Fix_normal_call()
Returns the object code which must be placed after each call to code generated by the C compiler according to each platform calling convention.
Fix_cpcf_call()
Returns the object code which must be placed after each call to code generated by the C compiler that chance the program control flow. Since all the ops that change the program control flow returns the address in the bytecode where execution continues it must dereference that value and then jump.
- Parrot/Jit/${jitcpuarch}-${jitosname}.pm
-
Each of this files should use all the methods from Parrot/Jit/${jitcpuarch}Generic.pm that fits the current platform and redefine the methods that don't. Also must define some constants:
$Parrot::Jit::OP_ARGUMENT_SIZE
Is the size of the opcode argument in bytes. If the size in bits is not a multiple of 8 round it down.
$Parrot::Jit::Call_immediate_arg_size
The size of the instruction used to pass an immediate value as an argument in bytes.
$Parrot::Jit::Call_address_arg_size
The size of the instruction used to pass an address as an argument in bytes.
$Parrot::Jit::Call_start
The size of the instruction/s that are before the position where will be the address or the displacement to the called function before dealing with the arguments.
$Parrot::Jit::Call_move
This is used to correct the position of the call when some argument require more than just one instruction.
$Parrot::Jit::Precompiled_call_position
The position of the call in the precompiled call to a Parrot opcode.
%Parrot::Jit::syscall_number
The key is the system call name and the value the number.
- jit2h.pl
-
Reads the .jit files and prints the struct opcode_assembly_t.
Format of .jit Files
Jit files are interpreted as follows:
- op-name { body }
-
Where op-name is the name of the Parrot opcode, and body consists of a sequence of the following forms:
- Assembly instruction.
-
Which may have one of this Identifiers as an argument:
INT_REG[n]
Gets replaced by the
INTVAL
register specified in the nth argument.NUM_REG[n]
Gets replaced by the
FLOATVAL
register specified in the nth argument.STRING_REG[n]
Gets replaced by the
STRING
register specified in the nth argument.INT_CONST[n]
Gets replaced by the
INTVAL
constant specified in the nth argument.NUM_CONST[n]
Gets replaced by the
FLOATVAL
constant specified in the nth argument.STRING_CONST_bufstart[n]
Gets replaced by
bufstart
of theSTRING
constant specified in the nth argument.STRING_CONST_buflen[n]
Gets replaced by
buflen
of theSTRING
constant specified in the nth argument.STRING_CONST_flags[n]
Gets replaced by
flags
of theSTRING
constant specified in the nth argument.STRING_CONST_strlen[n]
Gets replaced by
strlen
of theSTRING
constant specified in the nth argument.STRING_CONST_encoding[n]
Gets replaced by
encoding
of theSTRING
constant specified in the nth argument.STRING_CONST_type[n]
Gets replaced by
type
of theSTRING
constant specified in the nth argument.STRING_CONST_language[n]
Gets replaced by
language
of theSTRING
constant specified in the nth argument.CONST_INT[n]
Gets replaced by the nth integer constant defined in jit.c
CONST_FLOAT[n]
Gets replaced by the nth floatval constant defined in jit.c
CONST_CHAR[n]
Gets replaced by the nth char constant defined in jit.c
TEMP_INT[n]
Gets replaced by the nth temporary integer array.
TEMP_FLOAT[n]
Gets replaced by the nth temporary float array.
TEMP_CHAR[n]
Gets replaced by the nth temporary char array.
You must preside all the identifiers with & requesting the address of that identifier, or * requesting the value, * can be used only with constants since the replacement is done before start running.
&INTERPRETER[n]
Gets replaced by the address of the interpreter.
*CUR_OPCODE[n]
Gets replaced by the address of the current opcode in the Parrot bytecode.
- FUNC(func-name, arg1, ..., argN)
-
Call a function defined in another
.jit
file (except from the core). - SYSTEM_CALL(syscall-name, arg1, ..., argN)
-
Call a system call.
- CALL(func-name, arg1, ..., argN)
-
Call a C function. The idea is to replace all the CALL() with FUNC().
- Arguments to CALL and SYSTEMCALL
-
The arguments to CALL and SYSTEMCALL must be preceeded by V to indicate that the value should be taken as an immediate or A to indicate that the value should be dereferenced.
ALPHA Notes
The access to Parrot registers is done relative to $6
, all other memory access is done relative to $27
, to access float constants relative to $7
so you must preside the instruction with ldah $7,0($27).
EXAMPLE
Let's see how this work:
Parrot Assembly:
set I0,8
set I2,I0
print "Big piece of JIT\n"
time I0
end
Parrot Bytecode: (only the bytecode segment is showed)
+-----------------------------------------------+
| 63 | 0 | 8 | 62 | 2 | 0 | 24 | 0 | 48 | 0 | 0 |
+-|------------|------------|--------|--------|-+
| | | | |
| | | | +-- end (no arguments)
| | | +----------- time_i (1 argument)
| | +-------------------- print_sc (1 argument)
| +--------------------------------- set_i_i (2 arguments)
+---------------------------------------------- set_i_ic (2 arguments)
Please note that the opcode numbers used might have already changed.
Intel x86 assembly version of the Parrot ops:
Parrot_set_i_ic {
movl *INT_CONST[2],&INT_REG[1]
}
Parrot_set_i_i {
movl &INT_REG[2],%eax
movl %eax,&INT_REG[1]
}
Parrot_print_sc {
movl $1,&TEMP_INT[1]
SYSTEMCALL(WRITE,3, A&TEMP_INT[1] V&STRING_CONST_bufstart[1] V*STRING_CONST_strlen[1])
}
Parrot_end {
leave
ret
}
Note that there is no Parrot_time_i so, the code generated by the C compiler for Parrot_time_i will be called.
Intel x86 object code of the Parrot ops:
Parrot_set_i_ic {
\xc7\x05\x00\x00\x00\x00\x00\x00\x00\x00 # mov $0,0x0
}
Parrot_set_i_i {
\xa1\x00\x00\x00\x00 # mov 0x0,%eax
\xa3\x00\x00\x00\x00 # mov %eax,0x0
}
Parrot_print_sc {
\xc7\x05\x00\x00\x00\x00\x01\x00\x00\x00 # mov $1,0x0
\x68\x00\x00\x00\x00 # push 0x0
\x68\x00\x00\x00\x00 # push 0x0
\xff\x35\x00\x00\x00\x00 # push $0
\x50 # push
\xb8\x04\x00\x00\x00 # mov $4,%eax
\xcd\x80 # int 80h
\x72\x00 # jb 0
}
Parrot_end {
\xc9 # leave
\xc3 # ret
}
Parrot_time_i {
\x68\x00\x00\x00\x00 # pushl 0x0
\x68\x00\x00\x00\x00 # pushl 0x0
\xe8\x00\x00\x00\x00 # call 0x0
\x83\xc4\x08 # add $0x8,%esp
}
The object code for time_i is the same that for any opcode that isn't implemented in core.jit
Build process:
Memory dump of the JIT code being generated:
+-----------------------------------------+
| 0x55 0x89 0xe5 0xc7 0x05 0x00 0x00 0x00 |
| 0x00 0x00 0x00 0x00 0x00 0x00 0x00 0x00 |
+-----------------------------------------+
That is the state after the code for the first op has been copied. The 0x55 0x89 0xe5 you see before the object code for Parrot_set_i_ic is the output of Parrot::Jit->init()
Fill it with addresses and/or values:
+-----------------------------------------+
| 0x55 0x89 0xe5 0xc7 0x05 0x00 0xa0 0x10 |
| 0x00 0x08 0x00 0x00 0x00 0x00 0x00 0x00 |
+-----------------------------------------+
The address of I0 (&intepreter->int_reg.registers[0]) is 0x10a000 (or whatever), so the first 4 bytes after the opcode number are filled with it, and the other contiguous 4 with the constant it self.
The same process is done one time per opcode.
The final result:
+-----------------------------------------+
| 0x55 0x89 0xe5 0xc7 0x05 0x00 0xa0 0x10 |
| 0x00 0x08 0x00 0x00 0x00 0xa1 0x00 0xa0 |
| 0x10 0x00 0xa3 0x08 0xa0 0x10 0x00 0xc7 |
| 0x05 0x54 0x7a 0x10 0x00 0x01 0x00 0x00 |
| 0x00 0x68 0x11 0x00 0x00 0x00 0x68 0x18 |
| 0xb0 0x10 0x00 0xff 0x35 0x54 0x7a 0x10 |
| 0x00 0x50 0xb8 0x04 0x00 0x00 0x00 0xcd |
| 0x80 0x72 0x00 0x68 0x00 0xa0 0x10 0x00 |
| 0x68 0xe0 0x60 0x12 0x00 0xe8 0xae 0xdb |
| 0xed 0xff 0x83 0xc4 0x08 0xc9 0xc3 0x00 |
+-----------------------------------------+
This code is ready to be called.