NAME

Statistics::Data - Load, access, update one or more data lists for statistical analysis

VERSION

This is documentation for Version 0.10 of Statistics/Data.pm, released Jan 2017.

SYNOPSIS

use Statistics::Data 0.10;
my $dat = Statistics::Data->new();

# managing labelled arrays:
$dat->load({'aname' => \@data1, 'anothername' => \@data2}); # labels are arbitrary
$aref = $dat->access(label => 'aname'); # gets back a copy of @data1
$dat->add(aname => [2, 3]); # pushes new values onto loaded copy of @data1
$dat->dump_list(); # print to check if both arrays are loaded and their number of elements
$dat->unload(label => 'anothername'); # only 'aname' data remains loaded
$aref = $dat->access(label => 'aname'); # $aref is a reference to a copy of @data1
$dat->dump_vals(label => 'aname', delim => ','); # proof in print it's back 

# managing multiple anonymous arrays:
$dat->load(\@data1, \@data2); # any number of anonymous arrays
$dat->add([2], [6]); # pushes a single value apiece onto copies of @data1 and @data2
$aref = $dat->access(index => 1); # returns reference to copy of @data2, with its new values
$dat->unload(index => 0); # only @data2 remains loaded, and its index is now 0

DESCRIPTION

Handles data for some other statistics modules, as in loading, updating and retrieving data for analysis. Performs no actual statistical analysis itself.

Rationale is not wanting to write the same or similar load, add, etc. methods for every statistics module, not to provide an omnibus API for Perl stat modules. It, however, encompasses much of the variety of how Perl stats modules do the basic handling their data. Used for Statistics::Sequences (and its sub-tests).

SUBROUTINES/METHODS

Manages caches of one or more lists of data for use by some other statistics modules. The lists are ordered arrays comprised of literal scalars (numbers, strings). They can be loaded, added to (updated), accessed or unloaded by referring to the index (order) in which they have been loaded (or previously added to), or by a particular label. The lists are cached within the class object's '_DATA' aref as an aref itself, optionally associated with a 'label'. The particular structures supported here to load, update, retrieve, unload data are specified under load. Any module that uses this one as its base can still use its own rules to select the appropriate list, or provide the appropriate list within the call to itself.

Constructors

new

$dat = Statistics::Data->new();

Returns a new Statistics::Data object.

clone

$new_self = $dat->clone();

Alias: clone

Returns a copy of the class object with its data loaded (if any). Note this is not a copy of any particular data but the whole blessed hash. Alternatively, use pass to get all the data added to a new object, or use access to load/add particular arrays of data into another object. Nothing modified in this new object affects the original.

Setting data

Methods to cache and uncache data into the data-object.

load

$dat->load(ARRAY);             # CASE 1 - can be updated/retrieved anonymously, or as index => i (load order)
$dat->load(AREF);            # CASE 2 - same, as aref
$dat->load(STRING => AREF);    # CASE 3 - updated/retrieved as label => 'data' (arbitrary name); or by index (order)
$dat->load({ STRING => AREF }) # CASE 4 - same as CASE 4, as hashref
$dat->load(STRING => AREF, STRING => AREF);      # CASE 5 - same as CASE 3 but with multiple named loads
$dat->load({ STRING => AREF, STRING => AREF });  # CASE 6 - same as CASE 5 bu as hashref
$dat->load(AREF, AREF);  # CASE 7 - same as CASE 2 but with multiple aref loads

# Not supported:
#$dat->load(STRING => ARRAY); # not OK - use CASE 3 instead
#$dat->load([AREF, AREF]); # not OK - use CASE 7 instead
#$dat->load([ [STRING => AREF], [STRING => AREF] ]); # not OK - use CASE 5 or CASE 6 instead
#$dat->load(STRING => AREF, STRING => [AREF, AREF]); # not OK - too mixed to make sense

Alias: load_data

Cache a list of data as an array-reference. Each call removes previous loads, as does sending nothing. If data need to be cached without unloading previous loads, use the add method instead. Arguments with the following structures are acceptable as data, and will be accessible by either index or label as expected:

load ARRAY

Load an anonymous array that has no named values. For example:

$dat->load(1, 4, 7);
$dat->load(@ari);

This is loaded as a single flat list, with an undefined label, and indexed as 0. Note that trying to load a labelled dataset with an unreferenced array is wrong - the label will be "folded" into the sequence itself.

load AREF

Load a reference to a single anonymous array that has no named values, e.g.:

$dat->load([1, 4, 7]);
$dat->load(\@ari);

This is loaded as a single flat list, with an undefined label, and indexed as 0.

load ARRAY of AREF(s)

Same as above, but note that more than one unlabelled array-reference can also be loaded at once, e.g.:

$dat->load([1, 4, 7], [2, 5, 9]);
$dat->load(\@ari1, \@ari2);

Each array can be accessed, using access, by specifying index => index, the latter value representing the order in which these arrays were loaded.

load HASH of AREF(s)

Load one or more labelled references to arrays, e.g.:

$dat->load('dist1' => [1, 4, 7]);
$dat->load('dist1' => [1, 4, 7], 'dist2' => [2, 5, 9]);

This loads the array(s) with a label attribute, so that when calling access, they can be retrieved by name, e.g., passing label => 'dist1'. The load method involves a check that there is an even number of arguments, and that, if this really is a hash, all the keys are defined and not empty, and all the values are in fact array-references.

load HASHREF of AREF(s)

As above, but where the hash is referenced, e.g.:

$dat->load({'dist1' => [1, 4, 7], 'dist2' => [2, 5, 9]});

This means that using the following forms--including a referenced array of referenced arrays--will produce unexpected results, if they do not actually croak, and so should not be used:

$dat->load(data => @data); # no croak but wrong - puts "data" in @data - use \@data
$dat->load([\@blue_data, \@red_data]); # use unreferenced ARRAY of AREFs instead
$dat->load([ [blues => \@blue_data], [reds => \@red_data] ]); # treated as single AREF; use HASH of AREFs instead
$dat->load(blues => \@blue_data, reds => [\@red_data1, \@red_data2]); # mixed structures not supported

A warning is not thrown if any of the given arrays actually contain no data. This could be sefully thrown; a child module might depend on there actually being data to statistically analyse (why not?) but only throw an error late in the process about it, and then perhaps ambiguously. But this could cause too many warnings if multiple analyses on different datasets are being programmatically run.

add

Alias: add_data, append_data, update

Same usage as above for load. Just push any value(s) or so along, or loads an entirely labelled list, without clobbering what's already in there (as load would). If data have not been loaded with a label, then appending data to them happens according to the order of array-refs set here, see EXAMPLES could even skip adding something to one previously loaded list by, e.g., going $dat->add([], \new_data) - adding nothing to the first loaded list, and initialising a second array, if none already, or appending these data to it.

unload

$dat->unload(); # deletes all cached data, named or not
$dat->unload(index => POSINT); # deletes the aref named 'data' whatever
$dat->unload(label => STRING); # deletes the aref named 'data' whatever

Empty, clear, clobber what's in there. Does nothing if given index or label that does not refer to any loaded data. This should be used whenever any already loaded or added data are no longer required ahead of another add, including via copy or share.

$dat_new->share($dat_old);

Adds all the data from one Statistics::Data object to another. Changes in the new copies do not affect the originals.

Getting data

To retrieve what has been previously loaded, simply call access, specifying the "label" or "index" that was used to load/add the data - i.e., when loaded as a hashref or an arrayref, respectively; specifying the list by label (as loaded hash-wise) or index (as loaded array-wise).

For retrieving more than one previously loaded dataset, use one of the "get" methods, choosing between getting back a hash- or an array-ref, or to get back a single list, as by access, after all. These "get" methods only support retrieving data loaded as hashrefs; use access to get back index-specific loads.

access

$aref = $dat->access(); #returns the first and/or only array loaded, if any
$aref = $dat->access(index => INT); #returns the ith array loaded
$aref = $dat->access(label => STRING); # returns a particular named cache of data

Alias: get_data

Returns one referenced array being previously loaded/added to data by the given index (in a flat-list load) or label (in a hash-wise load). Same as calling get_aref_by_lab.

get_hoa, get_hoa_by_lab

$href = $data->get_hoa(label => AREF_of_STRINGS); # retrieve 1 or more named data
$href = $data->get_hoa(); # retrieve all named data

Returns a hashref of arefs, where the keys are the names of the data, as previously given in a load, and the values are arefs of the list of data that has been loaded for that name.

The optional argument label should be a reference to a list of one or more data that have been given as keys in a hash-wise load. Any elements in this list that have not been used as names in a load are ignored. If none of the names has been used, an empty list is returned. If there is no label argument, then all of the loaded data are returned as a hashref of arefs; if there were no named data, this a reference to an empty hash.

This is useful in a module like Statistics::ANOVA::JT that needs to continuously cross-refer to multiple variables to make a single calculation while also being able to distinguish them by some meaningful key other than simply an index number.

For working with numerical data in particular, see the following two methods.

get_hoa_by_lab_numonly_indep

$hoa = $dat->get_hoa_by_lab_numonly_indep(label => AREF);
$hoa = $dat->get_hoa_by_lab_numonly_indep();

Returns the variables given in the argument label (an aref of strings), as by get_hoa, but each list culled of any empty or non-numeric values. This is done by treating each variable indpendently, with culls on one "list" not creating a cull on any other. This is the type of data useful for an independent ANOVA.

get_hoa_by_lab_numonly_across

$hoa = $dat->get_hoa_by_lab_numonly_across(); # same as get_hoa but each list culled of NaNs at same i across lists

Returns hashref of previously loaded variable data (as arefs) culled of an empty or non-numerical values whereby even a valid value in one list is culled if it is at an index that is invalid in another list. This is the type of data useful for a dependent ANOVA.

get_aoa, get_aoa_by_lab

$aref_of_arefs = $dat->get_aoa_by_lab(label => AREF);
$aref_of_arefs = $dat->get_aoa_by_lab(); # all loaded data

Returns a reference to an array where each value is itself an array of data, as separately loaded under a different name or anonymously, in the order that they were loaded. If no label value is defined, all the loaded data are returned as a list of arefs.

get_aref_by_lab

$aref = $dat->get_aref_by_lab(label => STRING);
$aref = $dat->get_aref_by_lab();

Returns a reference to a single, previously loaded hashref of arrayed of data, as specified in the named argument label. The array is empty if no data have been loaded, or if there is none with the given label. If label is not defined, the the last-loaded data, if any, is returned (as aref).

ndata

$n = $dat->ndata();

Returns the number of loaded variables.

labels

$aref = $dat->labels();

Returns a reference to an array of all the datanames (labels), if any.

Checking data

all_full

$bool = $dat->all_full(AREF); # test data are valid before loading them
$bool = $dat->all_full(label => STRING); # checking after loading/adding the data (or key in 'index')

Checks not only if the data array, as named or indexed, exists, but if it is non-empty: has no empty elements, with any elements that might exist in there being checked with hascontent.

all_numeric

$bool = $dat->all_numeric(); # test data first-loaded, if any
$bool = $dat->all_numeric(AREF); # test these data are valid before loading them
$bool = $dat->all_numeric(label => STRING); # check specific data after loading/adding them by a 'label' or by their 'index' order
($aref, $bool) = $dat->all_numeric([3, '', 4.7, undef, 'b']); # returns ([3, 4.7], 0); - same for any loaded data

Given an aref of data, or reference to data previously loaded (see access), tests numeracy of each element, and return, if called in scalar context, a boolean scalar indicating if all data in this aref are defined and not empty (using nocontent in String::Util), and, if they have content, if these are all numerical, using looks_like_number in Scalar::Util. Alternatively, if called in list context, returns the data (as an aref) less any values that failed this test, followed by the boolean. If the requested data do not exist, returns undef.

all_proportions

$bool = $dat->all_proportions(AREF); # test data are valid before loading them
$bool = $dat->all_proportions(label => STRING); # checking after loading/adding the data  (or key in 'index')

Ensure data are all proportions. Sometimes, the data a module needs are all proportions, ranging from 0 to 1 inclusive. A dataset might have to be cleaned

all_counts

$bool = $dat->all_counts(AREF); # test data are valid before loading them
$bool = $dat->all_counts(label => STRING); # checking after loading/adding the data  (or key in 'index')
($aref, $bool) = $dat->all_counts(AREF);

Returns true if all values in given data are real positive integers or zero, as well as satisfying "hascontent" and "looks_like_number" methods; false otherwise. Called in list context, returns aref of data culled of any values that are false on this basis, and then the boolean. For example, [2.2, 3, 4] and [-1, 3, 4] both fail, but [1, 3, 4] is true. Integer test is simply if $v == int($v).

all_pos

$bool = $dat->all_pos(AREF); # test data are valid before loading them
$bool = $dat->all_pos(label => STRING); # checking after loading/adding the data  (or key in 'index')
($aref, $bool) = $dat->all_pos(AREF);

Returns true if all values in given data are greater than zero, as well as "hascontent" and "looks_like_number"; false otherwise. Called in list context, returns aref of data culled of any values that are false on this basis, and then the boolean.

equal_n

$num = $dat->equal_n(AREF); # test data are valid before loading them
$num = $dat->equal_n(label => STRING); # checking after loading/adding the data  (or key in 'index')

If the given data or aref of variable names all have the same number of elements, then that number is returned; otherwise 0.

idx_anumeric

$aref = $dat->idx_anumeric(AREF); # test data are valid before loading them
$aref = $dat->idx_anumeric(label => STRING); # checking after loading/adding the data  (or key in 'index')

Given an aref (or the label or index by which it was previously loaded), returns a reference to an array of indices for that array where the values are either undefined, empty or non-numerical.

Dumping data

dump_vals

$seq->dump_vals(delim => ", "); # assumes the first (only?) loaded array should be dumped
$seq->dump_vals(index => INT, delim => ", "); # dump the i'th loaded array
$seq->dump_vals(label => STRING, delim => ", "); # dump the array loaded/added with the given "label"

Prints to STDOUT a space-separated line (ending with "\n") of a loaded/added data's elements. Optionally, give a value for delim to specify how the elements in each array should be separated; default is a single space.

dump_list

Dumps a list (using Text::SimpleTable) of the data currently loaded, without showing their actual elements. List is firstly by index, then by label (if any), then gives the number of elements in the associated array.

EXAMPLES

1. Multivariate data

In a study of how doing mental arithmetic affects arousal in self and others, three male frogs were maths-trained and then, as they did their calculations, were measured for pupillary dilation and perceived attractiveness. After four runs, average measures per frog can be loaded:

$frogs->load(Names => [qw/Freddo Kermit Larry/], Pupil => [59.2, 77.7, 56.1], Attract => [3.11, 8.79, 6.99]);

But one more frog still had to graduate from training, and data are now ready for loading:

$frogs->add(Names => ['Sleepy'], Pupil => [83.4], Attract => [5.30]);
$frogs->dump_data(label => 'Pupil'); # prints "59.2 77.7 56.1 83.4" : all 4 frogs' pupil data for analysis by some module

Another frog has been trained, measures taken:

$frogs->add(Pupil => [93], Attract => [6.47], Names => ['Jack']); # add yet another frog's data
$frogs->dump_data(label => 'Pupil'); # prints "59.2 77.7 56.1 83.4 93": all 5 frogs' pupil data

Now we run another experiment, taking measures of heart-rate, and can add them to the current load of data for analysis:

$frogs->add(Heartrate => [.70, .50, .44, .67, .66]); # add entire new array for all frogs
print "heartrate data are bung" if ! $frogs->all_proportions(label => 'Heartrate'); # validity check (could do before add)
$frogs->dump_list(); # see all four data-arrays now loaded, each with 5 observations (1 per frog), i.e.:
.-------+-----------+----.
| index | label     | N  |
+-------+-----------+----+
| 0     | Names     | 5  |
| 1     | Attract   | 5  |
| 2     | Pupil     | 5  |
| 3     | Heartrate | 5  |
'-------+-----------+----'

2. Using as a base module

As Statistics::Sequences, and so its sub-modules, use this module as their base, it doesn't have to do much data-managing itself:

use Statistics::Sequences;
my $seq = Statistics::Sequences->new();
$seq->load(qw/f b f b b/); # using Statistics::Data method
say $seq->p_value(stat => 'runs', exact => 1); # using Statistics::Sequences::Runs method

Or if these data were loaded directly within Statistics::Data, the data can be shared around modules that use it as a base:

use Statistics::Data;
use Statistics::Sequences::Runs;
my $dat = Statistics::Data->new();
my $runs = Statistics::Sequences::Runs->new();
$dat->load(qw/f b f b b/);
$runs->pass($dat);
say $runs->p_value(exact => 1);

DIAGNOSTICS

Don't know how to load/add the given data: Croaked when attempting to load or add data with an unsupported data structure where the first argument is a reference. See the examples under load for valid (and invalid) ways of sending data to them.
Data for accessing need to be loaded: Croaked when calling access, or any methods that use it internally -- viz., dump_vals and the validity checks all_numeric -- when it is called with a label for data that have not been loaded, or did not load successfully.
Data for unloading need to be loaded: Croaked when calling unload with an index or a label attribute and the data these refer to have not been loaded, or did not load successfully.

DEPENDENCIES

List::AllUtils - used for its all method when testing loads

Number::Misc - used for its is_even method when testing loads

String::Util - used for its hascontent and nocontent methods

Scalar::Util - required for all_numeric

Text::SimpleTable - required for dump_list

BUGS AND LIMITATIONS

Some methods rely on accessing previously loaded data but should permit performing their operations on data submitted directly to them, just like, e.g., $dat->all_numeric(\@data) is ok. This is handled for now internally, but should be handled in the same way by modules using this one as its base - for at the moment they have to check for an aref to their data-manipulating methods ahead of accessing any loaded data by this module.

Please report any bugs or feature requests to bug-statistics-data-0.01 at rt.cpan.org, or through the web interface at http://rt.cpan.org/NoAuth/ReportBug.html?Queue=Statistics-Data-0.01. This will notify the author, and then you'll automatically be notified of progress on your bug as any changes are made.

SUPPORT

You can find documentation for this module with the perldoc command.

perldoc Statistics::Data

You can also look for information at:

RT: CPAN's request tracker

http://rt.cpan.org/NoAuth/Bugs.html?Dist=Statistics-Data-0.10
AnnoCPAN: Annotated CPAN documentation

http://annocpan.org/dist/Statistics-Data-0.10
CPAN Ratings

http://cpanratings.perl.org/d/Statistics-Data-0.10
Search CPAN

http://search.cpan.org/dist/Statistics-Data-0.10/

AUTHOR

Roderick Garton, <rgarton at cpan.org>

LICENSE AND COPYRIGHT

This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License. See perl.org for more information.

To install Statistics::Data, copy and paste the appropriate command in to your terminal.

cpanm

cpanm Statistics::Data

CPAN shell

perl -MCPAN -e shell
install Statistics::Data

For more information on module installation, please visit the detailed CPAN module installation guide.

	Global
`s`	Focus search bar
`?`	Bring up this help dialog

	GitHub
`g` `p`	Go to pull requests
`g` `i`	go to github issues (only if github is preferred repository)

	POD
`g` `a`	Go to author
`g` `c`	Go to changes
`g` `i`	Go to issues
`g` `d`	Go to dist
`g` `r`	Go to repository/SCM
`g` `s`	Go to source
`g` `b`	Go to file browse

	Search terms
module: (e.g. module:Plugin)
distribution: (e.g. distribution:Dancer auth)
author: (e.g. author:SONGMU Redis)
version: (e.g. version:1.00)