The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.

NAME

Statistics::ANOVA::JT - Jonckheere-Terpstra statistics and test

VERSION

Version 0.01

SYNOPSIS

 use Statistics::ANOVA::JT;
 my $jt = Statistics::ANOVA::JT->new();
 $jt->load({1 => [2, 4, 6], 2 => [3, 3, 12], 3 => [5, 7, 11, 16]}); # note ordinal datanames
 my $j_value = $jt->observed(); # or expected(), variance()
 my ($z_value, $p_value) = $jt->zprob_test(ccorr => 2, tails => 1, correct_ties => 1);
 # or without pre-loading:
 $j_value = $jt->observed(data => {1 => [2, 4, 6], 2 => [5, 3, 12]});
 # or for subset of loaded data:
 $j_value = $jt->observed(lab => [1, 3]);
 

DESCRIPTION

Calculates Jonckheere-Terpstra statistics for sameness (common population) across given orders of independent variables. The statistics are based on a between-groups pooled ranking of the data, like the Kruskal-Wallis test, but, unlike Kruskall-Wallis that returns the same result regardless of order of levels, it takes into account ordinal value of the named data. As ordinal values, numerical intervals between the named values do not matter.

Data-loading and retrieval are as provided in Statistics::Data, on which the JT object is based, so its other methods are available here.

Return values are tested on installation against published examples: in Hollander and Wolfe (1999), for sample MStat output on mcardle.wisc.edu, and for the final Z-value in the wikipedia example.

SUBROUTINES/METHODS

new

 $jt = Statistics::ANOVA::JT->new();

New object for accessing methods and storing results. This "isa" Statistics::Data object.

observed

 $val = $jt->observed(); # data pre-loaded
 $val = $jt->observed(data => $hashref_of_arefs);

Returns the statistic J: From between-group rankings of all possible pairwise splits of the data, accumulates J as the sum of k(k - 1)/2 Mann-Whitney U counts.

Optionally, if the data have not been pre-loaded, send as named argument data.

expected

 $val = $jt->expected(); # data pre-loaded
 $val = $jt->expected(data => $hashref_of_arefs);

Returns the expected value of the J statistic for the given data.

variance

 $val = $jt->variance(); # data pre-loaded
 $val = $jt->variance(data => $hashref_of_arefs);

Return the variance expected to occur in the J values for the given data.

By default, the method accounts for and corrects for ties, but if correct_ties = 0, the returned value is the usual "null" distribution variance, otherwise with an elaborate correction accounting for the number of tied variables and each of their sizes, as offered by Hollander & Wolfe (1999) Eq 6.19, p. 204.

zprob_test

 $p_val = $jt->zprob_test(); # data pre-loaded
 $p_val = $jt->zprob_test(data => $hashref_of_arefs);
 ($z_val, $p_val) = $jt->zprob_test(); # get z-score too

Performs a z-test on the data and returns the associated probability; or, if called in array context, the z-value itself and then the probability value.

Rather than calculating the exact p-value, calculates an expected J value and variance, to provide a normalized J for which the p-value is read off the normal distribution. This is appropriate for "large" samples, e.g., greater-than 3 levels, with more than eight observations per level. Otherwise, read the value returned from $jt->observed() and look it up in a table of j-values, such as in Hollander & Wolfe (1999), p. 649ff.

Optional arguments include correct_ties (as above), and tails and ccorr as in Statistics::Zed. For example, to continuity correct by reducing the observed J-value by 1 (recommended in some texts), set ccorr => 2 (for half on either side of the expected value; if ccorr => 1, then 0.5 is taken off the observed deviation, and so on). The default is not to continuity correct.

REFERENCES

Hollander, M., & Wolfe, D. A. (1999). Nonparametric statistical methods. New York, NY, US: Wiley.

DEPENDENCIES

Statistics::Data : used as a base for caching and retrieving data.

Statistics::Data::Rank : used to implement between-sample ranking.

Statistics::Zed : for z-testing with optional continuity correction and tailing.

Algorithm::Combinatorics : provides the combinations algorithm to provide all possible pairs of data-names to loop thru in calculating the observed J value.

List::AllUtils : provides the handy sum0() function

BUGS

Please report any bugs or feature requests to bug-statistics-anova-jt-0.01 at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-ANOVA-JT-0.01. I will be notified, and then you'll automatically be notified of progress on your bug as I make changes.

SUPPORT

You can find documentation for this module with the perldoc command.

    perldoc Statistics::ANOVA::JT

You can also look for information at:

AUTHOR

Roderick Garton, <rgarton at cpan.org>

LICENSE AND COPYRIGHT

Copyright 2015 Roderick Garton.

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.

See http://dev.perl.org/licenses/ for more information.