NAME
Statistics::Distribution::Generator - A way to compose complicated probability functions
VERSION
Version 0.013
SYNOPSIS
use Statistics::Distribution::Generator qw( :all );
my $g = gaussian(3, 1);
say $g; # something almost certainly between -3 and 9, but probably about 2 .. 4-ish
my $cloud = gaussian(0, 1) x gaussian(0, 1) x gaussian(0, 1);
say @$cloud; # a 3D vector almost certainly within (+/- 6, +/- 6, +/- 6) and probably within (+/- 2, +/- 2, +/- 2)
my $combo = gaussian(100, 15) | uniform(0, 200); # one answer with an equal chance of being picked from either distribution
DESCRIPTION
This module allows you to bake together multiple "simple" probability distributions into a more complex random number generator. It does this lazily: when you call one of the PDF generating functions, it makes an object, the value of which is not calculated at creation time, but rather re-calculated each and every time you try to read the value of the object. If you are familiar with Functional Programming, you can think of the exported functions returning functors with their "setup" values curried into them.
To this end, two of Perl's operators (x and |) have been overloaded with special semantics.
The x operator composes multiple distributions at once, giving an ARRAYREF of "answers" when interrogated, which is designed primarily to be interpreted as a vector in N-dimensional space (where N is the number of elements in the ARRAYREF).
The | operator composes multiple distributions into a single value, giving a SCALAR "answer" when interrogated. It does this by picking at random between the composed distributions (which may be weighted to give some higher precendence than others).
The first thing to note is that x and | have their normal Perl precendence and associativity. This means that parens are strongly advised to make your code more readable. This may be fixed in later versions of this module, by messing about with the B modules, but that would still not make parens a bad idea.
The second thing to note is that x and | may be "nested" arbitrarily many levels deep (within the usual memory & CPU limits of your computer, of course). You could, for instance, compose multiple "vectors" of different sizes using x to form each one, and select between them at random with , e.g.
my $forwards = gaussian(0, 0.5) x gaussian(3, 1) x gaussian(0, 0.5);
my $backwards = gaussian(0, 0.5) x gaussian(-3, 1) x gaussian(0, 0.5);
my $left = gaussian(-3, 1) x gaussian(0, 0.5) x gaussian(0, 0.5);
my $right = gaussian(3, 1) x gaussian(0, 0.5) x gaussian(0, 0.5);
my $up = gaussian(0, 0.5) x gaussian(0, 0.5) x gaussian(3, 1);
my $down = gaussian(0, 0.5) x gaussian(0, 0.5) x gaussian(-3, 1);
my $direction = $forwards | $backwards | $left | $right | $up | $down;
$robot->move(@$direction);
You are strongly encouraged to seek further elucidation at Wikipedia or any other available reference site / material.
EXPORTABLE FUNCTIONS
- gaussian(MEAN, SIGMA)
-
Gaussian Normal Distribution. This is the classic "bell curve" shape. Numbers close to the MEAN are more likely to be selected, and the value of SIGMA is used to determine how unlikely more-distant values are. For instance, about 2/3 of the "answers" will be in the range (MEAN - SIGMA) <= N <= (MEAN + SIGMA), and around 99.5% of the "answers" will be in the range (MEAN - 3 * SIGMA) <= N <= (MEAN + 3 * SIGMA). "Answers" as far away as 6 * SIGMA are approximately a 1 in a million long shot.
- uniform(MIN, MAX)
-
A Uniform Distribution, with equal chance of any N where MIN <= N < MAX. This is equivalent to Perl's standard
rand()
function, except you supply the MIN and MAX instead of allowing them to fall at 0 and 1 respectively. Any value within the range should be equally likely to be chosen, provided you have a "good" random number generator in your computer.
- logistic
-
The Logistic Distribution is used descriptively in a wide variety of fields from market research to the design of neural networks, and is also known as the hyperbolic secant squared distribution.
- supplied(VALUE)
- supplied(CALLBACK)
-
Allows the caller to supply either a constant VALUE which will always be returned as is, or a coderef CALLBACK that may use any algorithm you like to generate a suitable random number. For now, this is the main plugin methodology for this module. The supplied CALLBACK is given no arguments, and SHOULD return a numeric answer. If it returns something non-numeric, you are entirely on your own in how to interpret that, and you are probably doing it wrongly.
- gamma(ORDER, SCALE)
-
The Gamma Distribution function is a generalization of the chi-squared and exponential distributions, and may be given by
p(x) dx = {1 \over \Gamma(a) b^a} x^{a-1} e^{-x/b} dx for x > 0.
The ORDER argument corresponds to what is also known as the "shape parameter" k, and the SCALE argument corresponds to the "scale parameter" theta.
If k is an integer, the Gamma Distribution is equivalent to the sum of k exponentially-distributed random variables, each of which has a mean of theta.
- exponential(LAMBDA)
-
The Exponential Distribution function is often useful when modeling / simulating the time between events in certain types of system. It is also used in reliability theory and the Barometric formula in physics.
OVERLOADED OPERATORS
- x
-
Allows you to compose multi-dimensional random vectors.
$randvect = $foo x $bar x $baz; # generate a three-dimensional vector
- |
-
Allows you to pick a single (optionally weighted) generator from some set of generators.
$cointoss = supplied 0 | supplied 1; # fair 50:50 result of either 0 or 1
OBJECT ATTRIBUTES
- $distribution->{ weight }
-
This setting may be used to make |-based selections favor one or more outcomes more (or less) than the remaining outcomes. The default weight for all outcomes is 1. Weights are relative, not absolute, so may be scaled however you need.
$foo = exponential 1.5; $bar = gaussian 20, 1.25; $foo->{ weight } = 6; $quux = $foo | $bar; # 6:1 chance of picking $foo instead of $bar
AUTHOR
The main body of this work is by Paul W Bennett
The idea of composing probabilities together is inspired by work done by Sooraj Bhat, Ashish Agarwal, Richard Vuduc, and Alexander Gray at Georgia Tech and NYU, published around the end of 2011.
The implementation of the Gamma Distribution is by Nigel Wetters Gourlay.
CAVEATS
Almost no error checking is done. Garbage in will result in garbage out.
This is ALPHA quality software. Any aspect of it, including the API and core functionality, is likely to change at any time.
TODO
Build in more probability density functions.
Tests. Lots of very clever tests.
LICENSE
Artistic 2.0