The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

PDL::Dataflow -- description of the dataflow implementation and philosophy

SYNOPSIS

        pdl> $x = zeroes(10);
        pdl> $y = $x->slice("2:4:2");
        pdl> $y ++;
        pdl> print $x;
        [0 0 1 0 1 0 0 0 0 0]

DESCRIPTION

As of 2.079, this is now a description of the current implementation, together with some design thoughts from its original author, Tuomas Lukka.

Two-directional dataflow (which implements ->slice() etc.) is fully functional, as shown in the SYNOPSIS. One-way is implemented, but with restrictions.

TWO-WAY

Just about any function which returns some subset of the values in some ndarray will make a binding. $y has become effectively a window to some sub-elements of $x. You can also define your own routines that do different types of subsets. If you don't want $y to be a window to $x, you must do

        $y = $x->slice("some parts")->sever;

The sever destroys the slice transform, thereby turning off all dataflow between the two ndarrays.

Type conversions

This works, thanks to a two-way flowing transform that implements type-conversions, particularly for supplied outputs of the "wrong" type for the given transform:

  pdl> $a_bad = pdl double, '[1 BAD 3]';
  pdl> $b_float = zeroes float, 3;
  pdl> $a_bad->assgn($b_float); # could be written as $b_float .= $a_bad
  pdl> p $b_float->badflag;
  1
  pdl> p $b_float;
  [1 BAD 3]

ONE-WAY

You need to explicitly turn on one-way dataflow on an ndarray to activate it for non-flowing operations (see "flowing" in PDL::Core), so

        pdl> $x = pdl 2,3,4;
        pdl> $y = $x->flowing * 2;
        pdl> print $y;
        [4 6 8]
        pdl> $x->set(0,5);
        pdl> print $y;
        [10 6 8]

flowing operates by turning on a flag that the core turns off immediately after acting on it (rather like inplace), so it only operates on the next operation (aka transfrom) called on that ndarray, making that operation be a forward-flowing one. That lasts until the output(s) of that operation get severed.

It is not possible to turn on backwards dataflow (such as is used by slice-type operations), because there is no general way for PDL (or maths, in fact) to know how to reverse most operations - consider $z = $x * $y, then adding one to $z.

Consider the following code:

        $u = sequence(3,3);
        $v = ones(3,3);
        $w = $u->flowing + $v->flowing;
        $y = $w->flowing + 1;
        $x = $w->diagonal(0,1);
        $z = $w->flowing + 2;

As of 2.090, $x will not have backward dataflow. This is because PDL turns that off, on detecting an input with forward-only dataflow. This means PDL will only created directed acyclic graphs of dataflow. It also means you can only modify the contents of $x by modifying its "upstream" data sources.

What do $x, $y, and $z contain now?

        pdl> p $x
        [1 5 9]
        pdl> p $y
        [
         [ 2  3  4]
         [ 5  6  7]
         [ 8  9 10]
        ]
        pdl> p $z
        [
         [ 3  4  5]
         [ 6  7  8]
         [ 9 10 11]
        ]

What about when $u is changed and a recalculation is triggered?

        pdl> $u += 7;
        pdl> p $x
        [8 12 16]
        pdl> p $y
        [
         [ 9 10 11]
         [12 13 14]
         [15 16 17]
        ]
        pdl> p $z
        [
         [10 11 12]
         [13 14 15]
         [16 17 18]
        ]

Example of dataflow to implement 3D space calculations

This is a complete, working example that demonstrates the use of enduring flowing relationships to model 3D entities, through a few transformations:

  {package PDL::3Space;
  use PDL;
  sub new { my ($class, $parent) = @_;
    my $self = bless {basis_local=>identity(3), origin_local=>zeroes(3)}, $class;
    if (defined $parent) {
      $self->{parent} = $parent;
      $self->{basis} = $self->{basis_local}->flowing x $parent->{basis}->flowing;
      $self->{origin} = ($self->{origin_local}->flowing x $self->{basis}->flowing)->flowing + $parent->{origin}->flowing;
    } else {
      $self->{basis} = $self->{basis_local};
      $self->{origin} = $self->{origin_local}->flowing x $self->{basis}->flowing;
    }
    $self;
  }
  use overload '""' => sub {$_[0]{basis}->glue(1,$_[0]{origin}).''};
  sub basis_update { $_[0]{basis_local} .= $_[1] x $_[0]{basis_local} }
  sub origin_move { $_[0]{origin_local} += $_[1] }
  sub local { my $local = PDL::3Space->new; $local->{$_} .= $_[0]{$_} for qw(basis_local origin_local); $local}
  }

This is the class, heavily inspired by Math::3Space, and following discussions on interoperation between that and PDL (see https://github.com/nrdvana/perl-Math-3Space/pull/8). The basis and origin members are "subscribed" to both their own local basis and origin, and their parent's if any. The basis_update and origin_move methods only update the local members, and basis_update does so in terms of its previous value.

The demonstrating code has a boat, and bird within its frame of reference. Note that the "local" origin still gets affected by its local basis.

The basis and origin are always in global coordinates, and thanks to dataflow, are only recalculated on demand.

  $rot_90_about_z = PDL->pdl([0,1,0], [-1,0,0], [0,0,1]);

  $boat = PDL::3Space->new;
  print "boat=$boat";
  $bird = PDL::3Space->new($boat);
  print "bird=$bird";
  # boat=
  # [
  #  [1 0 0]
  #  [0 1 0]
  #  [0 0 1]
  #  [0 0 0]
  # ]
  # bird=
  # [
  #  [1 0 0]
  #  [0 1 0]
  #  [0 0 1]
  #  [0 0 0]
  # ]

  $boat->basis_update($rot_90_about_z);
  print "after boat rot:\nboat=$boat";
  print "bird=$bird";
  # after boat rot:
  # boat=
  # [
  #  [ 0  1  0]
  #  [-1  0  0]
  #  [ 0  0  1]
  #  [ 0  0  0]
  # ]
  # bird=
  # [
  #  [ 0  1  0]
  #  [-1  0  0]
  #  [ 0  0  1]
  #  [ 0  0  0]
  # ]

  $boat->origin_move(PDL->pdl(1,0,0));
  print "after boat move:\nboat=$boat";
  print "bird=$bird";
  print "bird local=".$bird->local;
  # after boat move:
  # boat=
  # [
  #  [ 0  1  0]
  #  [-1  0  0]
  #  [ 0  0  1]
  #  [ 0  1  0]
  # ]
  # bird=
  # [
  #  [ 0  1  0]
  #  [-1  0  0]
  #  [ 0  0  1]
  #  [ 0  1  0]
  # ]
  # bird local=
  # [
  #  [1 0 0]
  #  [0 1 0]
  #  [0 0 1]
  #  [0 0 0]
  # ]

  $bird->basis_update($rot_90_about_z);
  $bird->origin_move(PDL->pdl(1,0,1));
  print "after bird rot and move:\nbird=$bird";
  print "bird local=".$bird->local;
  # after bird rot and move:
  # bird=
  # [
  #  [-1  0  0]
  #  [ 0 -1  0]
  #  [ 0  0  1]
  #  [-1  1  1]
  # ]
  # bird local=
  # [
  #  [ 0  1  0]
  #  [-1  0  0]
  #  [ 0  0  1]
  #  [ 0  1  1]
  # ]

  $boat->basis_update(PDL::MatrixOps::identity(3) * 2);
  print "after boat expand:\nboat=$boat";
  print "bird=$bird";
  # after boat expand:
  # boat=
  # [
  #  [ 0  2  0]
  #  [-2  0  0]
  #  [ 0  0  2]
  #  [ 0  2  0]
  # ]
  # bird=
  # [
  #  [-2  0  0]
  #  [ 0 -2  0]
  #  [ 0  0  2]
  #  [-2  2  2]
  # ]

LAZY EVALUATION

In one-way flow context like the above, with:

        pdl> $y = $x * 2;

nothing will have been calculated at this point. Even the memory for the contents of $y has not been allocated. Only the command

        pdl> print $y

will actually cause $y to be calculated. This is important to bear in mind when doing performance measurements and benchmarks as well as when tracking errors.

There is an explanation for this behaviour: it may save cycles but more importantly, imagine the following:

        $x = pdl 2,3,4;
        $y = pdl 5,6,7;
        $z = $x->flowing + $y->flowing;
        $x->setdims([4]);
        $y->setdims([4]);
        print $z;

Now, if $z were evaluated between the two resizes, an error condition of incompatible sizes would occur.

What happens in the current version is that resizing $x raises a flag in $z: PDL_PARENTDIMSCHANGED and $y just raises the same flag again. When $z is next evaluated, the flags are checked and it is found that a recalculation is needed.

Of course, lazy evaluation can sometimes make debugging more painful because errors may occur somewhere where you'd not expect them.

FAMILIES

This is one of the more intricate concepts of dataflow. In order to make dataflow work like you'd expect, a rather strange concept must be introduced: families. Let us make a diagram of the one-way flow example - it uses a hypergraph because the transforms (with +) are connectors between ndarrays (with *):

       u*   *v
         \ /
          +(plus)
          |
   1*     *w
     \   /|\
      \ / | \
 (plus)+  |  +(diagonal)
       |  |  |
      y*  |  *x
          |
          | *1
          |/
          +(plus)
          |
         z*

This is what PDL actually has in memory after the first three lines. When $x is changed, $w changes due to diagonal being a two-way operation.

If you want flow from $w, you opt in using $w->flowing (as shown in this scenario). If you didn't, then don't enable it. If you have it but want to stop it, call $ndarray->sever. That will destroy the ndarray's trans_parent (here, a node marked with +), and as you can visually tell, will stop changes flowing thereafter. If you want to leave the flow operating, but get a copy of the ndarray at that point, use $ndarray->copy - it will have the same data at that moment, but have no flow relationships.

EVENTS

There is the start of a mechanism to bind events onto changed data, intended to allow this to work:

        pdl> $x = pdl 2,3,4
        pdl> $y = $x + 1;
        pdl> $c = $y * 2;
        pdl> $c->bind( sub { print "A now: $x, C now: $c\n" } )
        pdl> PDL::dowhenidle();
        A now: [2,3,4], C now: [6 8 10]
        pdl> $x->set(0,1);
        pdl> $x->set(1,1);
        pdl> PDL::dowhenidle();
        A now: [1,1,4], C now: [4 4 10]

This hooks into PDL's magic which resembles Perl's, but does not currently operate.

There would be many kinds of uses for this feature: self-updating charts, for instance. It is not yet fully clear whether it would be most useful to queue up changes (useful for doing asynchronously, e.g. when idle), or to activate things immediately.

In the 2022 era of both GPUs and multiple cores, it is a pity that Perl's dominant model remains single-threaded on CPU, but PDL can use multi-cores for CPU processing (albeit controlled in a single-threaded style) - see PDL::ParallelCPU. It is planned that PDL will gain the ability to use GPUs, and there might be a way to hook that up albeit probably with an event loop to "subscribe" to GPU events.

TRANSFORMATIONS

PDL implements nearly everything (except for XS oddities like set) using transforms which connect ndarrays. This includes data transformations like addition, "slicing" to access/operate on subsets, and data-type conversions (which have two-way dataflow, see "Type conversions").

This does not currently include a resizing transformation, and setdims mutates its input. This is intended to change.

AUTHOR

Copyright(C) 1997 Tuomas J. Lukka (lukka@fas.harvard.edu). Same terms as the rest of PDL.