NAME

Math::GSL::Histogram - Create and manipulate histograms of data

SYNOPSIS

use Math::GSL::Histogram qw/:all/;

my $H = gsl_histogram_alloc(100);
gsl_histogram_set_ranges_uniform($H,0,101);
gsl_histogram_increment($H, -50 );  # ignored
gsl_histogram_increment($H, 70 );   
gsl_histogram_increment($H, 85.2 );

my $G = gsl_histogram_clone($H);
my $value = gsl_histogram_get($G, 70);
my ($max,$min) = (gsl_histogram_min_val($H), gsl_histogram_max_val($H) );
my $sum = gsl_histogram_sum($H);

DESCRIPTION

Here is a list of all the functions included in this module :

gsl_histogram_alloc($n) - This function allocates memory for a histogram with $n bins, and returns a pointer to a newly created gsl_histogram struct. The bins and ranges are not initialized, and should be prepared using one of the range-setting functions below in order to make the histogram ready for use.
gsl_histogram_calloc
gsl_histogram_calloc_uniform
gsl_histogram_free($h) - This function frees the histogram $h and all of the memory associated with it.
gsl_histogram_increment($h, $x) - This function updates the histogram $h by adding one (1.0) to the bin whose range contains the coordinate $x. If $x lies in the valid range of the histogram then the function returns zero to indicate success. If $x is less than the lower limit of the histogram then the function returns $GSL_EDOM, and none of bins are modified. Similarly, if the value of $x is greater than or equal to the upper limit of the histogram then the function returns $GSL_EDOM, and none of the bins are modified. The error handler is not called, however, since it is often necessary to compute histograms for a small range of a larger dataset, ignoring the values outside the range of interest.
gsl_histogram_accumulate($h, $x, $weight) - This function is similar to gsl_histogram_increment but increases the value of the appropriate bin in the histogram $h by the floating-point number weight.
gsl_histogram_find($h, $x) - This function finds the bin number which covers the coordinate $x in the histogram $h. The bin is located using a binary search. The search includes an optimization for histograms with uniform range, and will return the correct bin immediately in this case. If $x is found in the range of the histogram then the function returns the bin number and returns $GSL_SUCCESS. If $x lies outside the valid range of the histogram then the function returns $GSL_EDOM and the error handler is invoked.
gsl_histogram_get($h, $i) - This function returns the contents of the $i-th bin of the histogram $h. If $i lies outside the valid range of indices for the histogram then the error handler is called with an error code of GSL_EDOM and the function returns 0.
gsl_histogram_get_range($h, $i) - This function finds the upper and lower range limits of the $i-th bin of the histogram $h. If the index $i is valid then the corresponding range limits are returned after the 0 in this order : lower and then upper. The lower limit is inclusive (i.e. events with this coordinate are included in the bin) and the upper limit is exclusive (i.e. events with the coordinate of the upper limit are excluded and fall in the neighboring higher bin, if it exists). The function returns 0 to indicate success. If i lies outside the valid range of indices for the histogram then the error handler is called and the function returns an error code of $GSL_EDOM.
gsl_histogram_max($h) - This function returns the maximum upper limit of the histogram $h. It provides a way of determining this value without accessing the gsl_histogram struct directly.
gsl_histogram_min($h) - This function returns the minimum lower range limit of the histogram $h. It provides a way of determining this value without accessing the gsl_histogram struct directly.
gsl_histogram_bins($h) - This function returns the number of bins of the histogram $h limit. It provides a way of determining this value without accessing the gsl_histogram struct directly.
gsl_histogram_reset($h) - This function resets all the bins in the histogram $h to zero.
gsl_histogram_calloc_range
gsl_histogram_set_ranges($h, $range, $size) - This function sets the ranges of the existing histogram $h using the array $range of size $size. The values of the histogram bins are reset to zero. The $range array should contain the desired bin limits. The ranges can be arbitrary, subject to the restriction that they are monotonically increasing. Note that the size of the $range array should be defined to be one element bigger than the number of bins. The additional element is required for the upper value of the final bin.
gsl_histogram_set_ranges_uniform($h, $xmin, $xmax) - This function sets the ranges of the existing histogram $h to cover the range $xmin to $xmax uniformly. The values of the histogram bins are reset to zero. The bin ranges are shown in the table below,
bin[0] corresponds to xmin <= x < xmin + d
bin[1] corresponds to xmin + d <= x < xmin + 2 d
......
bin[n-1] corresponds to xmin + (n-1)d <= x < xmax

where d is the bin spacing, d = (xmax-xmin)/n.

gsl_histogram_memcpy($dest, $src) - This function copies the histogram $src into the pre-existing histogram $dest, making $dest into an exact copy of $src. The two histograms must be of the same size.
gsl_histogram_clone($src) - This function returns a pointer to a newly created histogram which is an exact copy of the histogram $src.
gsl_histogram_max_val($h) - This function returns the maximum value contained in the histogram bins.
gsl_histogram_max_bin($h) - This function returns the index of the bin containing the maximum value. In the case where several bins contain the same maximum value the smallest index is returned.
gsl_histogram_min_val($h) - This function returns the minimum value contained in the histogram bins.
gsl_histogram_min_bin($h) - This function returns the index of the bin containing the minimum value. In the case where several bins contain the same maximum value the smallest index is returned.
gsl_histogram_equal_bins_p($h1, $h2) - This function returns 1 if the all of the individual bin ranges of the two histograms are identical, and 0 otherwise.
gsl_histogram_add($h1, $h2) - This function adds the contents of the bins in histogram $h2 to the corresponding bins of histogram $h1, i.e. h'_1(i) = h_1(i) + h_2(i). The two histograms must have identical bin ranges.
gsl_histogram_sub($h1, $h2) - This function subtracts the contents of the bins in histogram $h2 from the corresponding bins of histogram $h1, i.e. h'_1(i) = h_1(i) - h_2(i). The two histograms must have identical bin ranges.
gsl_histogram_mul($h1, $h2) - This function multiplies the contents of the bins of histogram $h1 by the contents of the corresponding bins in histogram $h2, i.e. h'_1(i) = h_1(i) * h_2(i). The two histograms must have identical bin ranges.
gsl_histogram_div($h1, $h2) - This function divides the contents of the bins of histogram $h1 by the contents of the corresponding bins in histogram $h2, i.e. h'_1(i) = h_1(i) / h_2(i). The two histograms must have identical bin ranges.
gsl_histogram_scale($h, $scale) - This function multiplies the contents of the bins of histogram $h by the constant $scale, i.e. h'_1(i) = h_1(i) * scale.
gsl_histogram_shift($h, $offset) - This function shifts the contents of the bins of histogram $h by the constant $offset, i.e. h'_1(i) = h_1(i) + offset.
gsl_histogram_sigma($h) - This function returns the standard deviation of the histogrammed variable, where the histogram is regarded as a probability distribution. Negative bin values are ignored for the purposes of this calculation. The accuracy of the result is limited by the bin width.
gsl_histogram_mean($h) - This function returns the mean of the histogrammed variable, where the histogram is regarded as a probability distribution. Negative bin values are ignored for the purposes of this calculation. The accuracy of the result is limited by the bin width.
gsl_histogram_sum($h) - This function returns the sum of all bin values. Negative bin values are included in the sum.
gsl_histogram_fwrite($stream, $h) - This function writes the ranges and bins of the histogram $h to the stream $stream, which has been opened by the gsl_fopen function from the Math::GSL module, in binary format. The return value is 0 for success and $GSL_EFAILED if there was a problem writing to the file. Since the data is written in the native binary format it may not be portable between different architectures.
gsl_histogram_fread($stream, $h) - This function reads into the histogram $h from the open stream $stream, which has been opened by the gsl_fopen function from the Math::GSL module, in binary format. The histogram $h must be preallocated with the correct size since the function uses the number of bins in $h to determine how many bytes to read. The return value is 0 for success and $GSL_EFAILED if there was a problem reading from the file. The data is assumed to have been written in the native binary format on the same architecture.
gsl_histogram_fprintf($stream, $h, $range_format, $bin_format) - This function writes the ranges and bins of the histogram $h line-by-line to the stream $stream (from the gsl_fopen function from the Math::GSL module) using the format specifiers $range_format and $bin_format. These should be one of the %g, %e or %f formats for floating point numbers. The function returns 0 for success and $GSL_EFAILED if there was a problem writing to the file. The histogram output is formatted in three columns, and the columns are separated by spaces, like this,
range[0] range[1] bin[0]
range[1] range[2] bin[1]
range[2] range[3] bin[2]
....
range[n-1] range[n] bin[n-1]

The values of the ranges are formatted using range_format and the value of the bins are formatted using bin_format. Each line contains the lower and upper limit of the range of the bins and the value of the bin itself. Since the upper limit of one bin is the lower limit of the next there is duplication of these values between lines but this allows the histogram to be manipulated with line-oriented tools.

gsl_histogram_fscanf($stream, $h) - This function reads formatted data from the stream $stream, which has been opened by the gsl_fopen function from the Math::GSL module, into the histogram $h. The data is assumed to be in the three-column format used by gsl_histogram_fprintf. The histogram $h must be preallocated with the correct length since the function uses the size of $h to determine how many numbers to read. The function returns 0 for success and $GSL_EFAILED if there was a problem reading from the file.
gsl_histogram_pdf_alloc($n) - This function allocates memory for a probability distribution with $n bins and returns a pointer to a newly initialized gsl_histogram_pdf struct. If insufficient memory is available a null pointer is returned and the error handler is invoked with an error code of $GSL_ENOMEM.
gsl_histogram_pdf_init($p, $h) - This function initializes the probability distribution $p with the contents of the histogram $h. If any of the bins of $h are negative then the error handler is invoked with an error code of $GSL_EDOM because a probability distribution cannot contain negative values.
gsl_histogram_pdf_free($p) - This function frees the probability distribution function $p and all of the memory associated with it.
gsl_histogram_pdf_sample($p, $r) - This function uses $r, a uniform random number between zero and one, to compute a single random sample from the probability distribution $p. The algorithm used to compute the sample s is given by the following formula, s = range[i] + delta * (range[i+1] - range[i]) where i is the index which satisfies sum[i] <= r < sum[i+1] and delta is (r - sum[i])/(sum[i+1] - sum[i]).

EXAMPLES

The following example shows how to create a histogram with logarithmic bins with ranges [1,10), [10,100) and [100,1000).

$h = gsl_histogram_alloc (3);
            
# bin[0] covers the range 1 <= x < 10
# bin[1] covers the range 10 <= x < 100
# bin[2] covers the range 100 <= x < 1000
 
$range = [ 1.0, 10.0, 100.0, 1000.0 ];
             
gsl_histogram_set_ranges($h, $range, 4);

AUTHORS

Jonathan Leto <jonathan@leto.net> and Thierry Moisan <thierry.moisan@gmail.com>

COPYRIGHT AND LICENSE

Copyright (C) 2008 Jonathan Leto and Thierry Moisan

This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.