NAME
Algorithm::RandomPointGenerator -- This module generates a set of random points in a 2D plane according to a user-specified probability distribution that is provided to the module in the form of a 2D histogram.
SYNOPSIS
# The quickest way to use the module is through the script genRand2D that you'll
# find in the examples subdirectory. You can move this script to any convenient
# location in your directory structure. Call this script as a command-line utility
# in the following manner:
genRand2D --histfile your_histogram_file.csv --bbfile your_bounding_box_file.csv
# where the '--histfile' option supplies the name of the file that contains a 2D
# histogram and the option '--bbfile' the name of the file that defines a bounding
# box in the XY-plane to which the histogram applies. The module uses the
# Metropolis-Hastings algorithm to draw random points from a probability density
# function that is approximated by the 2D histogram you supply through the
# '--histfile' option. You can also run the command
genRand2D --help
# for further information regarding these two command-line options. An invocation
# of genRand2D gives you 2000 random points that are deposited in a file whose
# name is printed out in the terminal window in which you invoke the genRand2D
# command.
# The rest of this Synopsis concerns using the module with greater control over
# the production and display of random points. Obviously, the very first thing
# you would need to do would be to import the module:
use Algorithm::RandomPointGenerator;
# Next, name the file that contains a 2D histogram for the desired density
# function for the generation of random points:
my $input_histogram_file = "histogram.csv";
# Then name the file that defines the bounding box for the random points:
my $bounding_box_file = "bounding_box.csv";
# Now construct an instance of RandomPointGenerator using a call that, assuming
# you wish to set all the constructor options, would look like:
my $generator = Algorithm::RandomPointGenerator->new(
input_histogram_file => $input_histogram_file,
bounding_box_file => $bounding_box_file,
number_of_points => 2000,
how_many_to_discard => 500,
proposal_density_width => 0.1,
y_axis_pos_direction => 'down',
output_hist_bins_along_x => 40,
);
# The role served by the different constructor options is explained later in this
# documentation. After constructing an instance of the module, you would need to
# call the following methods for generating the random points and for displaying
# their histogram:
$generator->read_histogram_file_for_desired_density();
$generator->read_file_for_bounding_box();
$generator->normalize_input_histogram();
$generator->set_sigmas_for_proposal_density();
$generator->metropolis_hastings();
$generator->write_generated_points_to_a_file();
$generator->make_output_histogram_for_generated_points();
$generator->plot_histogram_3d_surface();
# As to what is accomplished by each of the methods called above is described
# later in this documentation. Note that since several of the constructor
# parameters have defaults, a minimal call to the constructor may look as brief
# as:
my $generator = Algorithm::RandomPointGenerator->new(
input_histogram_file => $input_histogram_file,
bounding_box_file => $bounding_box_file,
);
# In this case, the number of points to be generated is set to 2000. These will
# be the points after the first 500 that are discarded to get past the effects of
# the starting state of the generator.
CHANGES
Version 1.01 downshifts the version of Perl that is required for this module. The implementation code for the module is unchanged from Version 1.0.
DESCRIPTION
Several testing protocols for "big data" research projects involving large geographic areas require a random set of points that are distributed according to a user-specified probability density function that exists in the form of a 2D histogram. This module is an implementation of the Metropolis-Hastings algorithm for generating such a set of points.
METHODS
The module provides the following methods:
- new():
-
A call to
new()
constructs a new instance of theAlgorithm::RandomPointGenerator
class. If you wanted to set all the constructor options, this call would look like:my $generator = Algorithm::RandomPointGenerator->new( input_histogram_file => $input_histogram_file, bounding_box_file => $bounding_box_file, number_of_points => 2000, how_many_to_discard => 500, proposal_density_width => 0.1, y_axis_pos_direction => 'down', );
where
input_histogram_file
is the name of the file that contains a 2D histogram as an approximation to the desired probability density function for the random points to be generated. The data in the histogram file is expected to be in CSV format. Here is a display of a very small portion of the contents of such a file for an actual geographic region:0,211407,216387,211410,205621,199122,192870, ........ 0,408221,427716,427716,427716,427716,427716,427716, ...... 0,408221,427716,427716,427716,427716,427716,427716, ...... .... .... 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,165,9282,11967,15143, ..... 0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,....
The
bounding_box_file
parameter of the constructor should delineate the portion of the plane to which the input histogram corresponds. Here is an example of the contents of an actual file supplied for this option:-71.772016, -70.431923 -34.254251, -33.203240
Apart from any comment lines, there must exist exactly two lines in the bounding-box file, with the first line indicating the left and the right limits of the horizontal coordinates and the second line indicating the lower and the upper limits of the vertical coordinates. (The values shown above are the longitude and the latitude limits for a region in Chile, in case you are curious.)
Constructor Parameters:
input_histogram_file
:-
This required parameter supplies the name of the file that contains a 2D histogram as the desired density function for the random points that the module will generate. Each line record in this file must correspond to one row of the 2D histogram. The left-to-right direction in the file will be considered to be positive direction for the x-axis. As for the positive direction for the y-axis, it is common for that to be from top to bottom when the histogram is written out to a text file as an array of integers. It is important to bear in mind this orientation of the histogram in light of the fact that a bounding box is specified typically with its y-axis going upwards (whereas the x-axis continues to be positive from left to right). This inconsistency between how a 2D histogram is typically stored in a text file and how a bounding box is likely to be specified means that if the events occur more frequently in the upper right hand corner of the bounding box, those high counts would show up in the lower right hand corner of the histogram in a text file (or in a terminal display of the contents of such a file). You can use the constructor option
y_axis_pos_direction
to reverse the positive sense of the y direction for the histogram. If you sety_axis_pos_direction
to the stringup
, then the orientation of the y axis in both the histogram and the bounding box will be the same. bounding_box_file
:-
This required parameter supplies the name of the file that contains the bounding box information. By bounding box is meant the part of the XY-plane that corresponds to the histogram supplied through the
input_histogram_file
option. Apart from any comment lines, this file must contain exactly two lines and each line must contain exactly two numeric entries. Additionally, the first entry in each of the two lines must be smaller than the second entry in the same line. The two entries in the first line define the lower and the upper bounds on the x-axis and the two entries in the second line do the same for the y-axis. number_of_points
:-
This parameter specifies the number of random points that you want the module to generate.
how_many_to_discard
:-
The Metropolis-Hastings algorithm belongs to a category of algorithms known as random-walk algorithms. Since the random walk carried out by such algorithms must be initialized with user input, it is necessary to discard the points produced until the effects the initial state have died out. This parameter specifies how many of the generated points will be discarded. This parameter is optional in the sense that it has a default value of 500.
proposal_density_width
:-
Given the current point, the Metropolis-Hastings randomly chooses a candidate for the next point. This random selection of a candidate point is carried out using what is known as the "proposal density". The module uses a bivariate Gaussian for the proposal density. The value supplied through this parameter controls the standard deviations of the proposal Gaussian in the x and the y directions. The default value for this parameter is 0.1. With that value for the parameter, the standard deviation of the proposal density along the x direction will be set to 0.1 times the width of the bounding box, and the standard deviation of the same along the y-direction to 0.1 times the height of the bounding box.
y_axis_pos_direction
:-
See the explanation above for the constructor parameter
input_histogram_file
for why you may need to use they_axis_pos_direction
parameter. The parameter takes one of two values,up
anddown
. The usual interpretation of a 2D histogram in a text file --- with the positive direction of the y-axis pointing downwards --- corresponds to the case when this parameter takes the default value ofdown
. output_hist_bins_along_x
:-
This parameter controls the resolution with which the histogram of the generated random points will be displayed. The value you supply is for the number of bins along the x-direction. The number of bins along the y-direction is set according to the aspect ratio of the bounding box.
- read_histogram_file_for_desired_density():
-
$generator->read_histogram_file_for_desired_density();
This is a required call after the constructor is invoked. As you would expect, this call reads in the histogram data for the desired probability density function for random point generation.
- read_file_for_bounding_box():
-
$generator->read_file_for_bounding_box();
This is also a required call. This call reads in the left and the right limits on the x-axis and the lower and the upper limits on the y-axis for the portion of the XY-plane for which the random points are needed.
- normalize_input_histogram():
-
$generator->normalize_input_histogram();
This normalizes the input histogram to turn it into an approximation to the desired probability density function for the random points.
- set_sigmas_for_proposal_density():
-
$generator->set_sigmas_for_proposal_density();
The Metropolis-Hastings algorithm is a random-walk algorithm that at each point first creates a candidate for the next point and then makes a probabilistic decision regarding the acceptance of the candidate point as the next point on the walk. The candidate point is drawn from what is known as the proposal density function. This module uses a bivariate Gaussian for the proposal density. You set the width of the proposal density by calling this method.
- metropolis_hastings():
-
This is the workhorse of the module, as you would expect. As its name implies, it uses the Metropolis-Hastings random-walk algorithm to generate a set of random points whose probability distribution corresponds to the input histogram.
- write_generated_points_to_a_file():
-
$generator->write_generated_points_to_a_file();
This method writes the generated points to a disk file whose name is keyed to the name of the input histogram file. The two coordinates for each generated random point are written out to a new line in this file.
- make_output_histogram_for_generated_points():
-
$generator->make_output_histogram_for_generated_points();
The name of the method speaks for itself.
- plot_histogram_3d_surface():
-
$generator->plot_histogram_3d_surface();
This method uses a Perl wrapper for Gnuplot, as provided by the module Graphics::GnuplotIF, for creating a 3D surface plot of the histogram of the random points generated by the module.
- plot_histogram_lineplot():
-
$generator->plot_histogram_lineplot();
This creates a 3D line plot display of the histogram of the generated random points.
- display_output_histogram_in_terminal_window():
-
$generator->display_output_histogram_in_terminal_window();
Useful for debugging purposes, it displays in the terminal window a two dimensional array of numbers that is the histogram of the random points generated by the module.
THE examples
DIRECTORY
Probably the most useful item in the examples
directory is the command-line script genRand2D
that can be called simply with two arguments for generating a set of random points. A call to this script looks like
genRand2D --histfile your_histogram_file.csv --bbfile your_bounding_box_file.csv
where the --histfile
option supplies the name of the file that contains a 2D input histogram and the --bbfile
option the name of the file that defines the bounding box in the XY-plane. You can also execute the command line
genRand2D --help
for displaying information as to what is required by the two options for the genRand2D
command. The command-line invocation of genRand2D
gives you 2000 random points that are deposited in a file whose name is printed out in the terminal window in which you execute the command.
To become more familiar with all of the different options you can use with this module, you should experiment with the script:
generate_random_points.pl
You can feed it different 2D histograms --- even made-up 2D histograms --- and look at the histogram of the generated random points to see how well the module is working. Keep in mind, though, if your made-up input histogram has disconnected blobs in it, the random-points that are generated may correspond to just one of the blobs. Since the process of random-point generation involves a random walk, the algorithm may not be able to hop from one blob to another in the input histogram if they are too far apart. As to what exactly you'll get by way of the output histogram would depend on your choice of the width of the proposal density.
The examples
directory contains the following histogram and bounding-box files for you to get started:
hist1.csv bb1.csv
hist2.csv bb2.csv
If you run the generate_random_points.pl
script with the hist1.csv
and bb1.csv
files, the histogram you get for the 2000 random points generated by the module will look like what you see in the file output_histogram_for_hist1.png
. On a Linux machine, you can see this histogram with the usual display
command from the ImageMagick library. And if you run generate_random_points.pl
script with the hist2.csv
and bb2.csv
files, you'll see an output histogram that should look like what you see in output_histogram_for_hist2.png
.
You should also try invoking the command-line calls:
genRand2D --histfile hist1.csv --bbfile bb1.csv
genRand2D --histfile hist2.csv --bbfile bb2.csv
REQUIRED
This module requires the following three modules:
List::Util
Graphics::GnuplotIF
Math::Big
Math::Random
EXPORT
None by design.
BUGS
Please notify the author if you encounter any bugs. When sending email, please place the string 'RandomPointGenerator' in the subject line.
INSTALLATION
Download the archive from CPAN in any directory of your choice. Unpack the archive with a command that on a Linux machine would look like:
tar zxvf Algorithm-RandomPointGenerator-1.01.tar.gz
This will create an installation directory for you whose name will be Algorithm-RandomPointGenerator-1.01
. Enter this directory and execute the following commands for a standard install of the module if you have root privileges:
perl Makefile.PL
make
make test
sudo make install
if you do not have root privileges, you can carry out a non-standard install the module in any directory of your choice by:
perl Makefile.PL prefix=/some/other/directory/
make
make test
make install
With a non-standard install, you may also have to set your PERL5LIB environment variable so that this module can find the required other modules. How you do that would depend on what platform you are working on. In order to install this module in a Linux machine on which I use tcsh for the shell, I set the PERL5LIB environment variable by
setenv PERL5LIB /some/other/directory/lib64/perl5/:/some/other/directory/share/perl5/
If I used bash, I'd need to declare:
export PERL5LIB=/some/other/directory/lib64/perl5/:/some/other/directory/share/perl5/
THANKS
I thank Srezic for pointing out that I needed to downshift the required version of Perl for this module. Fortunately, I had access to an old machine still running Perl 5.10.1. The current version is based on my testing the module on that machine.
AUTHOR
Avinash Kak, kak@purdue.edu
If you send email, please place the string "RandomPointGenerator" in your subject line to get past my spam filter.
COPYRIGHT
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.
Copyright 2014 Avinash Kak