NAME

PDL::PP - Generate PDL routines from concise descriptions

SYNOPSIS

use PDL::PP qw/Modulename Packagename Prefix/;

addhdr('#include "hdr.h"');

defpdl(
	'Transpose',
	[qw/a(x,y,X) [o]b(y,x,X)/],
	'int c',
	'loop(x,y) %{
		$b() = $a();
	%}'
);

done();

DESCRIPTION

This module defines the routine defpdl that generates xsub code from a short description such as the transpose function above.

The idea is that since this concise description encodes in itself (better than C code, which would be difficult to interpret) what is necessary to do, this code can be compiled to C in many different ways. Also, the resulting C code can be easily made to do the right thing in many situations: for example, in the above code, the matrix b is a destination matrix so the code can check whether b exists and has the right dimensions or die or alternatively create a new b in that case.

Of course, a human can also code all the intelligent code, but if there are tens of different routines, it gets very dull after a while. And to think about reuse: in the above code, the line

b() = a();

is interpreted by the routine. At some hypothetical future time, if PDL starts supporting sparse matrices, this might still be made to work. Also, this code could be used in a wildly different environment from PDL, achieving a kind of universality. Alternatively, the compiler could, for debugging, place bounds checking at each access to a and b (because they are stored in memory sequentially, this would be far superior to the usual gcc bounds checking).

PDL variables

The second argument to defpdl is an array of strings of the form

[options]name(indices,X)

Options is a comma-separated list which can at the moment contain

o

This pdl is used only for output and is therefore liable to be necessary to create at runtime. In this case, all of its indices need to have a defined value.

The name is a lowercase alphanumeric name for the variable. One of the names can be preceded by ">" which means that is the function is called like $a = f($b) instead of f($a,$b) then this argument is the output. The indices part is a comma-separated list of lowercase index names or "..." or an uppercase index name for a "rest" index.

Indices

defpdl uses named indices. In the first example, there were two named indices, x and y and a "rest" index, X. Each index name is unique so the x in both the definitions of a and b are interpreted to mean the same number of elements and a runtime check is made of this.

The "rest" index is a special case which may contain several indices, and must be currently in the same order. The idea is that the code will be automatically looped over this set of indices. In the future, it may be possible to have several different "rest" indices for different sets of variables.

Loops

In the C code, it is possible to automatically create loops. In the example, the line

loop(x,y) 

Makes loops over the indices x and y. If all your dimensions mean different things, then this is usually sufficient but if you have some square matrices, for example correlation or so, you need to use the syntax

loop(x0,x1)

which starts two loops over the same size. Currently, to make it easier to program, the loops use the sequences %{ and %} (like yacc) to start and end. In the future, this may change.

As a point of interest, there is an actual parser and context manager with stack and all in the code. Perl makes these things very easy to do.

Array access

defpdl attempts to make the defaults do the right thing in a wide variety of cases without the need to specify the indices explicitly. However, special cases always arise and for those, the syntax

loop(x1,x2) %{ a() = b(x => x1) * c(x => x2) %};

may be used (here the sizes could be [qw/[o]a(x,x) b(x) c(x)/], in which case this sets a to the outer product of b and c.

Naming

For user access, there are some standard naming conventions. All loop variables have just the name inside the loop declaration. Index sizes have the name of the index followed by _size. The same name is used if it is necessary to specify the dimension of an output variable as a parameter.

INFLUENCES

The ideas here have been influenced by the language Yorick as well as matlab and scilab.

BUGS

Uncountably.

When using GCC, it would be much faster to just declare an array with variable number of indices than to use pdl_malloc. With other compilers, it would also be a lot faster to use a huge largest N_DIM (16, for example, or if you want to be *ABSOLUTELY* certain, 50) and be done with it. Then it will be on the stack, and allocated and accessed rapidly.

At the moment, the code does not create nonexistent or invalid-sized pdls. However, the change is fairly trivial.

The run-time error messages the code generates are really awful and uninformative.

An important issue is whether this version puts C too far from us. It is possible to use normal C loops instead of the loop() syntax and so on, but I think it may come in handy pretty often.

The code is not very readable at the moment. It is fairly modular, however.

The generated code is relatively inefficient, especially at access times. The outer loops should update pointers to the data accessed inside to be efficient. However, the comfort of writing code like this is very nice.

At the moment,

loop(x1,x2) %{ a() = b(x => x1) * c(x => x2) %};

doesn't work.

The '...' syntax does not yet work for "rest" indices.

AUTHOR

Copyright (C) 1995 Tuomas J. Lukka (Tuomas.Lukka@helsinki.fi)