TITLE

The Parrot Bytecode (PBC) Format

VERSION

2003.02.04

Format of the Parrot bytecode

0          1          2          3
+----------+----------+----------+----------+
| Wordsize | Byteorder|  Major   |  Minor   |
+----------+----------+----------+----------+

Wordsize must be at 4 (32-bit) or 8 (64 bit). Loader is responsible for transforming the file into the VM native wordsize on the fly. For performance, a utility pdump is provided to convert PBCs on disk if they cannot be recompiled.

Byteorder currently supports two values: (0-Little Endian, 1-Big Endian)

4          5
+----------+----------+----------+----------+
|  Flags   | FloatType|  10 Byte  ...       |
+----------+----------+----------+----------+
|           fingerprint for ...             |
+----------+----------+----------+----------+
|           core.ops is here                |
+----------+----------+----------+----------+

FloatType 0 is IEEE 754 8 byte double, FloatType 1 is i386 little endian 12 byte long double.

16
+----------+----------+----------+----------+
|         Parrot Magic = 0x 13155a1         |
+----------+----------+----------+----------+

Magic is stored in native byteorder. The loader uses the byteorder header to convert the Magic to verify. More specifically, ALL words (non-bytes) in the bytecode file are stored in native order, unless otherwise specified.

20*
+----------+----------+----------+----------+
|         Opcode Type (Perl = 0x5045524c)   |
+----------+----------+----------+----------+

The asterisk for the offset states, from here we have opcodes. The given offsets are for 32 bit opcode types only.

PBC Format 1 and Format 0

As stated below, after these header bytes the fixup segment size follows, which is always zero for the old format, called "Format 0" now.

For the current PBC format this field is 1.

PBC FORMAT 1

All segments are aligned at a 16 byte boundary. All segments share a common header and are kept in directories, which itself is a PBC segment. All offsets and sizes are in native opcodes of the machine that produced the PBC.

Format 1 Header

24*
+----------+----------+----------+----------+
|         dir_format      (1)               |
+----------+----------+----------+----------+
|         padding         (0)               |
+----------+----------+----------+----------+

After this header, the first PBC directory follows at offset 32* starting with a:

Format 1 Segment Header

+----------+----------+----------+----------+
| total size in opcodes including this size |
+----------+----------+----------+----------+
|         internal type (itype)             |
+----------+----------+----------+----------+
|         internal id   (id)                |
+----------+----------+----------+----------+
|         size of opcodes following         |
+----------+----------+----------+----------+

The size entry may be followed by a stream of size opcodes (starting 16 byte aligned), which may of course be no opcode stream at all for size zero.

After this common segment header there can be segment specific data determined by the segment type. A segment without additional data, like the bytecode segment, is a default segment. No additional routines are required to unpack such a segment.

Directory Segment

+----------+----------+----------+----------+
| number of directory entries               |
+----------+----------+----------+----------+

+----------+----------+----------+----------+
| segment type                              |
+----------+----------+----------+----------+
| segment name ...                          |
| ...        0x00       padding             |
+----------+----------+----------+----------+
| segment offset                            |
+----------+----------+----------+----------+
| segment op_count                          |
+----------+----------+----------+----------+

The op_count at offset must match the segments op_count and is used to verify the PBCs integrity.

Currently these segment types are defined:

0

Directory segment

1

Unknown segment (conforms to a default segment)

2

Fixup segment

3

Constant table segment

4

Bytecode segment

5

Debug segment

Segment Names

This is not determined yet.

Unknown (default) and byte code segments

These have only the common segment header and the opcode stream appended. The opcode stream is a mmap()ed memory region, if your operating system supports this (and if the PBC was read from a disk file). You have therefore to consider these data as readonly.

Fixup segment

+----------+----------+----------+----------+
| number of fixup entries                   |
+----------+----------+----------+----------+

+----------+----------+----------+----------+
| fixup type   (0)                          |
+----------+----------+----------+----------+
| label   name ...                          |
| ...        0x00       padding             |
+----------+----------+----------+----------+
| label offset                              |
+----------+----------+----------+----------+

Currently fixup type 0 is defined, which has a label symbol and an offset into the bytecode.

Debug Segment

+----------+----------+----------+----------+
| filename ...                              |
| ...        0x00       padding             |
+----------+----------+----------+----------+

The debug segment has one additional field with the source file name. The opcode stream holds one line number per bytecode instruction.

Constant Table Segment

s. below in the format 0 description.

PBC FORMAT 0

For each segment:

4, 4 + (4 + S0), 4 + (4 + S0) + (4 + S1)
+----------+----------+----------+----------+
|       Segment length in bytes (S)         |
+----------+----------+----------+----------+
|                                           |
:        S bytes of segment content         :
|                                           |
+----------+----------+----------+----------+

Currently there are three segment types defined, and they must occur in precisely the order: FIXUP, CONSTANT TABLE, BYTE CODE. Every segment must be present, even if empty.

FIXUP SEGMENT

<< The format for the FIXUP segment is not yet defined. >>

The fixup segment length is 0.

CONSTANT TABLE SEGMENT

0 (relative)
+----------+----------+----------+----------+
|            Constant Count (N)             |
+----------+----------+----------+----------+

For each constant:

+----------+----------+----------+----------+
|             Constant Type (T)             |
+----------+----------+----------+----------+
|             Constant Size (S)             |
+----------+----------+----------+----------+
|                                           |
|        S bytes of constant content        |
:       appropriate for representing        :
|              a value of type T            |
|                                           |
+----------+----------+----------+----------+

CONSTANTS

For integer constants:

<< integer constants are represented as manifest constants in
   the byte code stream currently, limiting them to 32 bit values. >>

For number constants (S is constant, and is equal to sizeof(FLOATVAL)):

+----------+----------+----------+----------+
|                                           |
|             S' bytes of Data              |
|                                           |
+----------+----------+----------+----------+

where

S' = S + (S % 4) ? (4 - (S % 4)) : 0

If S' > S, then the extra bytes are filled with zeros.

For string constants (S varies, and is the size of the particular string):

4, 4 + (16 + S'0), 4 + (16 + S'0) + (16 + S'1)
+----------+----------+----------+----------+
|                   Flags                   |
+----------+----------+----------+----------+
|                  Encoding                 |
+----------+----------+----------+----------+
|                   Type                    |
+----------+----------+----------+----------+
|                  Size (S)                 |
+----------+----------+----------+----------+
|                                           |
:             S' bytes of Data              :
|                                           |
+----------+----------+----------+----------+

where

S' = S + (S % 4) ? (4 - (S % 4)) : 0

If S' > S, then the extra bytes are filled with zeros.

BYTE CODE SEGMENT

The pieces that can be found in the byte code segment are as follows:

+----------+----------+----------+----------+
|              Operation Code               |
+----------+----------+----------+----------+

+----------+----------+----------+----------+
|             Register Argument             |
+----------+----------+----------+----------+

+----------+----------+----------+----------+
|    Integer Argument (Manifest Constant)   |
+----------+----------+----------+----------+

+----------+----------+----------+----------+
|   String Argument (Constant Table Index)  |
+----------+----------+----------+----------+

+----------+----------+----------+----------+
|   Number Argument (Constant Table Index)  |
+----------+----------+----------+----------+

The number and types for each argument can be determined by consulting Parrot::Opcode.

SOURCE CODE SEGMENT

Currently there are no utilities that use this segment, even though it is mentioned in some of the early Parrot documents.

Eventually there will be a more complete and useful PackFile specification, but this simple format works well enough for now (c. Parrot 0.0.5).

SEE ALSO

packfile.c, packfile.h, packout.c, packdump.c and the pdump utility pdump.c.

AUTHOR

Gregor N. Purdy <gregor@focusresearch.com>

Format 1 description by Leopold Toetsch <lt@toetsch.at>

5 POD Errors

The following errors were encountered while parsing the POD:

Around line 132:

Expected text after =item, not a number

Around line 136:

Expected text after =item, not a number

Around line 140:

Expected text after =item, not a number

Around line 144:

Expected text after =item, not a number

Around line 148:

Expected text after =item, not a number