NAME
Graphics::Skullplot::ClassifyColumns - simple type inference of columns of tabular data
VERSION
Version 0.01
SYNOPSIS
use Graphics::Skullplot::ClassifyColumns;
my $cc = Graphics::Skullplot::ClassifyColumns->new( data => $data );
my $plot_cols =
$cc->classify_columns_simple( { indie_count => $indie_count, } );
DESCRIPTION
Graphics::Skullplot::ClassifyColumns is a stripped down version of an old experimental module I was developing I called Data::Classify. I expect to go back to that project and develop a more elaborate system of plug-ins to target different kinds of databases and so on, most likely named Table::TypeInference.
This particular module just needs a "classify_columns_simple" routine that works well enough to figure out how to plot some data via ggplot2 in R (i.e. the "Graphics::Skullplot" project).
- new
-
Creates a new Graphics::Skullplot::ClassifyColumns object.
Takes a hashref as an argument, with named fields identical to the names of the object attributes. These attributes are:
- data
-
A required field, columns of data as an array of array references, with a header in the first row.
- classify_columns_simple
-
Note: here "simple" might be thought of as "stub": This does the simplest possible categorization using only a single numeric hint for the number of independent fields.
The presumption here is the incoming data is organized like the output of a typical sql group by select, x-axis in the first column a number of columns of dependent data as the end, and (possibly) a certain number of categorical variables (ones with a small number of allowed values) in-between.
This returns a hash indicating how different columns should be handled in the plotting stage, the keys are:
x (rename: indie_x ) y but just for when there's only one dependent gb_cats dep_fields (rename: dependents_y }
Example usage:
my $cc = Graphics::Skullplot::ClassifyColumns->new( data => $data ); my $opt = { indie_count => 1, }; my $plot_cols_href = $cc->classify_columns_simple( $opt );
- column_types
-
Given a reference to tabular data in an array-of-arrays format- with a header expected in the first row- tries to infer the rough data type of each column.
Returns a list (or aref) of the type codes, in sequence.
- classify
-
A wrapper around Scalar::Classify's "classify", which also subdivides the string category, looking for datetime types.
The type is most often (but not limited to) one of the following:
ARRAY HASH :NUMBER: :STRING:
This code examines any string values to see if a date/time code is more appropriate:
:DATE: :DATETIME: :TIME:
- most_common
-
Given a hash of numeric counts, returns the key of the maximum count.
In the case of a tie, the return will be one of the tie values, which one is undefined.
- define_regxeps
-
Generates a hashref of locally useful regexps.
These are mostly intended to identify dates and times. TODO just look up existing solutions, e.g. Regexp::Common.
AUTHOR
Joseph Brenner, <doom@kzsu.stanford.edu>, 22 May 2018
COPYRIGHT AND LICENSE
Copyright (C) 2018 by Joseph Brenner
This program is free software; you can redistribute it and/or modify it under the terms of either: the GNU General Public License as published by the Free Software Foundation; or the Artistic License.
No warranty is provided with this code.
See http://dev.perl.org/licenses/ for more information.