NAME

docs/pdds/draft/pdd29_compiler_tools.pod - Parrot Compiler Tools

VERSION

$Revision: 28231 $

MAINTAINER

Will "Coke" Coleda Klaas-Jan Stol

DEFINITIONS

Compiler

In this document, when we speak of a compiler, we mean PCT-based compilers.

HLL

A High-Level Language. Examples are: Perl, Ruby, Python, Lua, Tcl, etc.

ABSTRACT

This PDD specifies the Parrot Compiler Tools (PCT).

SYNOPSIS

Creating a PCT-based compiler can be done as follows:

.sub 'onload' :anon :load :init
    load_bytecode 'PCT.pbc'
    $P0 = get_hll_global ['PCT'], 'HLLCompiler'
    $P1 = $P0.'new'()
    $P1.'language'('Foo')
    $P1.'parsegrammar'('Foo::Grammar')
    $P1.'parseactions'('Foo::Grammar::Actions')
.end

.sub 'main' :main
    .param pmc args
    $P0 = compreg 'Foo'
    $P1 = $P0.'command_line'(args)
.end

{{ this is the most important part; is this enough? }}

The Parrot distribution contains a Perl script to generate a compiler stub, containing all necessary files. This generated compiler will compile out of the box. It is highly suggested to use this script to get started with the PCT. The script is located in tools/dev/mk_language_shell.pl.

{{ Not sure whether the mk_language_shell.pl script should be mentioned long term. In a sense, this script can also be considered part of the parrot compiler "tools", as it is used to create a compiler. }}

Parser Synopsis

grammar Foo is PCT::Grammar;

rule TOP {
    <statement>*
    {*}
}

rule statement {
    <ident> '=' <expression>
    {*}
}

rule expression is optable { ... }

proto infix:<+> is precedence('1') is pirop('n_add') { ... }

rule 'term:' is tighter(infix:<+>) is parsed(&term) { ... }

rule term {
    | <ident> {*}       #= ident
    | <number> {*}      #= number
}

Actions Synopsis

{{ Is this a good idea? }}

class Foo::Grammar::Actions;

method TOP($/) {
    my $past := PAST::Block.new( :blocktype('declaration'), :node($/) );
    for $<statement> {
        $past.push( $( $_ ) );
    }
    make $past;
}

method statement($/) {
    make PAST::Op.new( $( $<ident> ),
                       $( $<expression> ),
                       :pasttype('bind'),
                       :node($/) );
}

method expression($/, $key) {
    ...
}

method term($/, $key) {
    make $( $/{$key} );
}

Running the compiler

Running the compiler is then done as follows:

$ parrot foo.pbc [--target=[parse|past|post|pir]] <file>

{{ other options? Maybe --target=pbc in the future, once PBC can be generated? }}

DESCRIPTION

The Parrot Compiler Tools are specially designed to easily create a compiler targeting the Parrot Virtual Machine. The tools themselves run on Parrot, which implies that no other external programs are needed.

The PCT is a set of libraries and programs, to:

  • create a parser

  • create an intermediate data structure (Abstract Syntax Tree)

  • generate executable Parrot code

{{ Maybe just say it's used to create parrot-targeting compilers, not list these as =items }}

IMPLEMENTATION

The PCT is made up of the following libraries and programs:

  • Parrot Grammar Engine (PGE)

  • Parrot Abstract Syntax Tree (PAST) classes

  • Parrot Opcode Syntax Tree (POST) classes

  • PCT::HLLCompiler class

  • PCT::Grammar class

Although strictly speaking it is not part of the PCT, the Not Quite Perl (6) (NQP) language is typically used in all PCT-based compilers. NQP is a subset of the Perl 6 language, and is a high-level language as an alternative for PIR.

Compilation phases

A PCT-based compiler has by default four compilation phases, or transformations. Phases can be removed and added through the API of the HLLCompiler class. These are:

  • source to parse tree

    The source is read, parsed and stored in a parse tree.

  • parse tree to PAST

    The parse tree is converted into a Parrot Abstract Syntax Tree.

  • PAST to POST

    The PAST is converted into a Parrot Opcode Syntax Tree.

  • POST to PIR

    The POST is converted into executable Parrot Intermediate Representation.

Source to Parse Tree

The first stage of a PCT-based compiler is done by the parser. The parser is defined as a set of Perl 6 Rules, which is processed by the Perl 6 Rules compiler. This results in a generated PIR file that implements the parser.

{{ Doesn't this make the Perl 6 rules compiler part of the PCT? }}

During the first stage, the source (input string) is parsed, resulting in a parse tree.

Parse tree to PAST

The second stage converts the parse tree into a Parrot Abstract Syntax Tree (PAST). PAST is a data structure consisting of PAST nodes, each of which represents a common HLL construct. While all languages differ in syntax, many constructs in different HLLs map to the same semantics. This second transformation is executed during the parse stage. The transformations of the parse tree nodes into PAST nodes is done by so-called parse actions, which are methods of a class that is specified through the parseactions attribute of the HLLCompiler. Such classes are implemented in NQP.

{{ How do we say that this is not obligatory; you could also use PIR, and in the future maybe other languages. }}

PAST to POST

The third transformation converts the PAST into a Parrot Opcode Syntax Tree (POST). PAST nodes represent HLL constructs, which are transformed into a set of low-level POST nodes. A POST node is a low-level node, representing a single instruction, label, or a subroutine. While a PAST is very close to a HLL program, a POST is much closer to PIR code.

POST to PIR

The last transformation generates PIR code from the POST.

The generated PIR is then fed into the Parrot executable, and processed into Parrot Byte Code (PBC) by the PIR compiler.

Parrot Grammar Engine

The Parrot Grammar Engine (PGE) is a component that executes regular expressions. Besides classic regular expressions, it also understands Perl 6 Rules. Such rules are special regular expressions to define a grammar.

The start symbol in a grammar is named TOP; this is the top-level rule that is executed when the parser is invoked.

Operator precedence parsing

{{ insert stuff about using an operator prec. table here }}

Parrot Abstract Syntax Tree

The PCT includes a set of PAST classes. PAST classes represent common language constructs, such as a while statement. These are described extensively in "pdds/pdd26_ast.pod" in docs.

Parrot Opcode Syntax Tree

POST::Node

POST::Node is the base class for all other POST classes.

POST::Op

POST::Ops

POST::Label

POST::Sub

PCT::Grammar

The class PCT::Grammar is a built-in grammar class that can be used as a parent class for a custom grammar. This class defines a number of rules and tokens that are inherited by child classes. Note that the concept of class and grammar are equivalent.

The following rules are predefined:

{{ is this necessary, or just a reference to the file? }}

ident
ws

PCT::HLLCompiler

All PCT-based compilers use a HLLCompiler object as a compiler driver. It acts as a facade for the compiler. This object invokes the different compiler phases.

HLLCompiler API Methods

{{ TODO: complete this }}

language

$P0.'language'('Foo')

parsegrammar

$P0.'parsegrammar'('Foo::Grammar')

parseactions

$P0.'parseactions('Foo::Grammar::Actions')

commandline_prompt

$P0.'commandline_prompt'($S0)

sets the string in $S0 as a commandline prompt on the compiler in $P0. The prompt is the text that is shown on the commandline before a command is entered when the compiler is started in interactive mode.

commandline_banner

$P0.'commandline_banner'($S0)

sets the string in $S0 as a commandline banner on the compiler in $P0. The banner is the first text that is shown when the compiler is started in interactive mode. This can be used for a copyright notice or other information.

ATTACHMENTS

None.

REFERENCES

docs/pdd26_ast.pod

http://dev.perl.org/perl6/doc/design/syn/S05.html