Implementing Variable Types with Vtables
This is a guide to creating your own PMC (Parrot Magic Cookie) classes; it tells you what you need to write in order to add new variable types to Parrot.
Overview
The guts of the Parrot interpreter are, by design, ignorant (or, if you want to be less disparaging, agnostic) of the intricacies of variable type behaviour. The standard example is the difference between Perl scalars and Python scalars. In Perl, if you have
$a = "1a";
$a++;
you end up with $a
being 1b
. This is a factor of the way Perl scalars increment themselves. In Python, on the other hand, you'd get a runtime error.
To be perfect honest, this is a slightly flawed example, since it's unlikely that there will be a distinct "Python scalar" PMC class. The Python compiler could well type-inference variables such that
a
would be aPythonString
andb
would be aPythonNumber
. But the point remains - incrementing aPythonString
is very different from incrementing aPerlScalar
Since the behaviour is a function of the "type" of the PMC, it's natural to consider the various different types of PMC as classes in an object-oriented system whereby the interpreter calls methods on the individual PMC objects to manipulate them. So the example above would translate to something like:
Construct a new PMC in the PerlScalar class.
Call a method setting its string value to
"a1"
.Call a method to tell it to increment itself.
And if you replace PerlScalar with PythonString, you get different behaviour but to the fundamental guts of the interpreter, the instructions are the same. PMCs are an abstract virtual class; the interpreter calls a method, the PMC object does the right thing, and the interpreter shouldn't have to care particularly what that right thing happens to be.
Hence, adding a new data type to Parrot is a question of providing methods which implement that data type's expected behaviour. Let's now look at how one is supposed to do this.
Starting out
If you're adding data types to the core of Parrot, (and you've checked with Dan and/or Simon that you're supposed to be doing so) you should be creating a file in the classes/ subdirectory; this is where all the built-in PMC classes live. (And a good source of examples to plunder even if you're not writing a core data type.)
You should almost always start by running genclass.pl found in the classes/ subdirectory to generate a skeleton for the class. Let's generate a number type for the beautifully non-existant Fooby language:
perl -I../lib genclass.pl FoobyNumber > foobynumber.pmc
This will produce a skeleton C file (to be preprocessed by the pmc2c.pl program) with stubs for all the methods you need to fill in; the final function in the file, Parrot_FoobyNumber_init
, allows you to set up anything you need to set up, and creates the vtable structure containing all the methods.
Now you'll have to do something a little different depending on whether you're writing a built-in class or an extension class. If you're writing a built-in class, then you'll see a reference to enum_class_FoobyNumber
in the init
function. This is something that you need to add to the enum
of built-in classes located in pmc.h. If you're not writing a built-in class, you need to change the type of the init
function to return struct _vtable
, and then return temp_base_vtable
instead of assigning to the Parrot_base_vtables
array.
To finish up adding a built-in class:
Add classes/YOURCLASS.pmc to MANIFEST.
Add classes/YOURCLASS$(O) to $(CLASS_O_FILES) in Makefile.in.
Add YOURCLASS$(O) to $(O_FILES) in classes/Makefile.in.
Add YOURCLASS.c and YOURCLASS$(O) targets to classes/Makefile.in.
Add enum_class_YOURCLASS to the enumeration in pmc.h.
Add a call to Parrot_YOURCLASS_class_init() to init_world() in global_setup.c.
What You Can and Cannot Do
The usual way to continue from the genclass.pl-generated skeleton is to define a structure that will hook onto the data
, if your data type needs to use that, and then also define some user-defined flags.
Flags are accessed by pmc->flags
. Most of the bits in the flag word are reserved for use by parrot itself, but a number of them have been assigned for general use by individual classes. These are referred to as PMC_private0_FLAG
.. PMC_private7_FLAG
. (The '7' may change during the early development of parrot, but will become pretty fixed at some point.)
Normally, you will want to alias these generic bit names to something more meaningful within your class:
enum {
Foobynumber_is_bignum = PMC_private0_FLAG,
Foobynumber_is_bigint = PMC_private1_FLAG,
....
};
You're quite at liberty to declare these in a separate header file, but I find it more convenient to keep everything together in foobynumber.c.
You may also use the cache
union in the PMC structure to remove some extraneous dereferences in your code if that would help.
Multimethods
One slightly (potentially) tricky element of implementing vtables is that several of the vtable functions have variant forms depending on the type of data that they're being called with.
For instance, the set_integer
method has multiple forms; the default set_integer
means that you are being called with a PMC, and you should probably use the get_integer
method of the PMC to find its integer value; set_integer_native
means you're being passed an INTVAL
and set_integer_bigint
is for when you receive a BIGINT
structure. The final form is slightly special; if the interpreter calls set_integer_same
, you know that the PMC that you are being passed is of the same type as you. Hence, you can break the class abstraction to save a couple of dereferences - if you want to.
Similar shortcuts exist for strings, (native
, unicode
and other
) and floating point numbers. (native
and bigfloat
) Functions which take "generic" numbers and may be either integer or float have int
, bigint
, float
and bigfloat
variants.
Methods you need to implement
The master list of vtable methods can be found in vtable.tbl in the root directory of the Parrot source; since that's not exactly verbose, here's a better description of the methods that you need to implement:
type
-
Return some notion of what 'type' you are; this can be used to communicate state information between PMCs of the same class.
name
-
Return a string containing your class name.
init
-
Do any data set-up you need to do.
clone
-
Copy your data, state and vtable into the passed-in PMC.
morph
-
Turn yourself into the specified type.
move_to
-
Move your private data to the given destination in memory.
real_size
-
Return how much memory you are actually taking up.
destroy
-
Do any data shut-down and finalization you need to do.
get_integer
-
Return an integer representation of yourself.
get_number
-
Return a floating-point representation of yourself.
get_string
-
Return a string representation of yourself (a STRING* object), this should be a copy of whatever string you are holding, not just a pointer to your own string so that anything that calls this method can happily modify this value without making a mess of your guts.
get_bool
-
Return a boolean representation of yourself.
get_value
-
Return your private data as a raw pointer.
is_same
-
True if the passed-in PMC refers to exactly the same data as you. (Contrast
is_equal
) set_integer
-
Set yourself to the passed-in integer value. This is an integer multimethod.
set_number
-
Set yourself to the passed-in float value. This is an floating-point multimethod.
set_string
-
Set yourself to the passed-in string. This is a string multimethod.
set_value
-
Set your private data to the raw pointer passed in. This will only be used in exceptional circumstances.
add
-
Fetch the number part of
value
and add your numeric value to it, storing the result indest
. (Probably by calling itsset_integer
orset_number
method) This is a numeric multimethod. subtract
-
Fetch the number part of
value
and subtract your numeric value from it, storing the result indest
. (Probably by calling itsset_integer
orset_number
method) This is a numeric multimethod. multiply
divide
modulus
-
You get the picture.
concatenate
-
Fetch the string part of
value
and catenate it to yourself, storing the result indest
. (Probably by calling itsset_string
method) This is a string multimethod. is_equal
-
True if the passed-in PMC has the same value as you. For instance, a Perl integer and a Python integer could have the same value, but could not be the same thing as defined by
is_same
. logical_or
logical_and
-
Perform the given short-circuiting logical operations between your boolean value and the value passed in, storing the result in
dest
. logical_not
-
Set yourself to be a logical negation of the value passed in.
match
-
Execute the given regular expression on
value
and store the result. repeat
-
Repeat your string representation
value
times and store the result indest
.
Parrot will provide a set of default methods you can inherit from if you don't need to do anything special for a given method. These will be named Parrot_default_...
. If you don't want to implement a specific method, simply say something like
void logical_not (PMC* value) = default;
and a sensible default will be provided for you.