TITLE
Indexing Aggregate PMCs
VERSION
CURRENT
Maintainer: Simon Cozens <simon@netthink.co.uk>
Class: Internals
PDD Number: 8
Version: 1.2
Status: Developing
Last Modified: 25 April, 2002
PDD Format: 1
Language: English
HISTORY
- Thu Apr 25 18:30:36 UTC 2002 : Version 1.2
-
Renamed
KEY_PAIR
toKEY_ATOM
, updated to reflect changeover to linked list. - steve@fink.com - Fri Mar 8 18:47:34 GMT 2002 : Version 1.1
-
updated to reflect Dan's comments that non-aggregates also support
_keyed
variant vtable methods.
ABSTRACT
This PDD aims to clear up the confusion regarding the implementation of keyed access to PMCs in Parrot.
DESCRIPTION
First, let's define some terminology. An aggregate PMC is one which allows indexed access to sub-element that it stores or references, by supporting and implementing the _keyed
variants of vtable methods. These variants are indexing operations, as they act on a specific indexed element of an aggregate PMC.
Non-aggregates also support _keyed
variants of the vtable methods, but they may not do anything particularly clever - for instance, PMC types implementing Perl references will merely pass the index on to the referent. These aren't aggregates because they don't directly store or reference sub-elements.
Indexing operations take one or more aggregate-key atoms. At runtime, these operations will index the key into the aggregate returning a value. Here is a well-known indexing operation in Perl 6:
@a[12] = $b;
The key here is the constant integer 12
, and the aggregate is the PerlArray
@a
. In the process of this assignment, Parrot will have to extract the PMC in element 12 of the array, producing a value. $b
is then assigned to this value.
Now, how does this all get implemented?
IMPLEMENTATION
The key structure
The key structure has to have the following features: it must bundle up multiple keys so that multidimensional aggregates can be indexed; keys may be specified as integer, string, number or PMC.
Hence, the following structure was produced. First, the individual keys as we think of them from a language level are stored in a structure called a KEY_ATOM
.
So, for instance, indexing the multidimensional array @foo[$a;12;"hi"]
produces three KEY_ATOMS
, one with a PMC type, one with an integer type and one with a string type. Since KEY_ATOMS
need to store the type as well as the value, they end up looking like this:
typedef struct _key_atom KEY_ATOM;
struct _key_atom {
KEY_TYPE type;
UnionVal val;
};
The next issue is to combine these things into a single key. Hence, the developer-facing KEY
structure looks like this:
typedef struct _key KEY;
struct _key {
KEY_ATOM atom;
KEY* next;
};
So a KEY is a linked list of KEY_ATOMs. A linked list is used so that partial keys can be easily generated as a multidimensional data structure is traversed.
These structures can be found in include/parrot/key.h
Aggregate and non-aggregate PMCs
We've already said that what separates the aggregate PMCs from the non-aggregates is their implementation of the _keyed
vtable methods. So it is Hereby Decreed that the default vtable which everyone inherits from defines the _keyed
forms to throw an exception.
- Todo
-
Need discussion on whether
OUT_OF_BOUNDS
is a good exception for this, or whether something else should be used. It's really a compiler screw-up, since code which indexes a non-aggregate shouldn't be generated.
_keyed
vtable methods
So what of these magical _keyed
vtable methods? They are generated when you add the keyed
tag to the appropriate entry in vtable.tbl. They are constructed by following every PMC
argument with a KEY
argument; the name of the KEY
argument is formed by adding _key
onto the end of the PMC
argument.
The reason why every PMC argument has an associated key is twofold. Firstly, it means that
@a[$b] = $c
and
$a = @b[$c]
use the same vtable method, reducing the multiplicity of methods. Secondly, a three-argument assign
as suggested by the code above would be ambiguous - the code above uses 3 PMCs in different ways.
Also, operations which take an aggregate key for one of their arguments should take aggregate keys for all of their arguments. This is to avoid the following:
void foo_keyed_i(PMC* x, KEY* y, INT a)
void foo_keyed_n(PMC* x, KEY* y, NUM a)
void foo_keyed_p(PMC* x, KEY* y, PMC a)
void foo_keyed_p_keyed(PMC* x, KEY* y, PMC* a, KEY* b)
These are all replaced with the single entry
void foo_keyed(PMC* x, KEY* y, PMC* a, KEY* b)
(Think how much worse it gets when there are three or more PMCs in an entry...)
Yes. This means that you may need to turn some things into PMC
s that you didn't want to. Since the alternative is mega pollution and duplication in the vtable table, and since the majority of things that you'll deal with in a real world situation are expected to be PMC
s anyway, this shouldn't be too much of a problem.
So, if you have a PMC in a _keyed
method which you don't want to index, pass in NULL
instead of a real key. Code implementing these methods should understand PMC* foo, KEY* NULL
as meaning the entirety of foo
in some sense; this is trivial to understand if foo
is non-aggregate, and implementation-defined if foo
is aggregate. If you remember that a KEY*
is really a linked list, you'll notice that after traversing down through the list, you'll reach a NULL
which again means the entirety of whatever object you traversed to.
Similarly, non-_keyed
methods on aggregates are implementation defined; for instance, a set_integer
on a PerlArray
may be understood as setting @array.length
.
Historically, we first implemented keys as two separate keyed methods per applicable method - ..._index
and ..._index_s
for integer and string indexing respectively. However, this didn't give us the flexibility and scalability that key structures give us.
Input to the assembler
There are several different valid specifications of an aggregate key to the assembler. These are:
op arg, P1[1234] # Constant integer key
op arg, P1[12.34] # Constant number key
op arg, P1["foo"] # Constant string key
op arg, P1[S1] # Register key
(Rationale: fits programmer's expectation, easier to understand at a glance than op P1, P2, P3
. Also, is op P1, P2, P3
the same as op P1[P2], P3
or op P1, P2[P3]
, or are these three separate PMCs?)
We shall refer to the first three types collectively as "constant keys" and the the last as "register keys".
What the assembler did next
When the assembler sees an aggregate key, it "detaches" the key to form a separate argument. It then decides whether or not it is a constant key or a register key. If it is a constant key, the data is temporarily munged in the same way as string constants. The constant itself is added to the constant integer/number/string list, much like any other constant, and an entry is added to the list of constant keys.
Next it selects the appropriate op. Register keys have the signature r
and constant keys have the signature kc
. For example:
set P1["hi"], 1234
finds an op named set_p_kc_i
. On the other hand,
set P1[S1], 1234
produces an op named set_p_r_i
.
Todo
-
This is only half implemented. The
r
business may need to be documented in another PDD somewhere. This is also not going to scale to keys likeP1;S1;123
, and I don't know how to do that.
Keys with multiple key-atoms may need to be classed as constant keys.
Bytecode representation
The bytecode representation of these keys are as follows: constant keys are treated just like another constant, and are an index into the "key" section of the packfile's constant table.
Each key in that constant table consists of one byte specifying its length in terms of number of key atoms. For instance, ["hi"]
has length 1; ["hi";P1;S1;123]
has length 4. Next, each key atom is specified using two bytes. The first byte is a type specifier:
0 - Integer
1 - Number
2 - String
3 - PMC
4 - Register
5 - Key
and the second byte is an index into the appropriate constant table, (in the first 4 cases and the last case) or a register specifier. (see below)
Register arguments (The _r
bit) are specified as a single byte. The top two bits are the type specifier:
00 - Integer
01 - Number
10 - String
11 - PMC
and the bottom six bits are the register number.
- Todo
-
The register specification stuff may need to move to another PDD.
The interpreter's interpretation
Delayed until we have bytecode that produces what we want.