NAME
Statistics::FactorAnalysis - A Perl implementation of Factor Analysis using the Principal Component Method.
VERSION
This document describes Statistics::FactorAnalysis version 0.0.2
SYNOPSIS
use Statistics::FactorAnalysis;
# Data is entered as a reference to a LoL. In this case each nested LIST corresponds to a separate variable - thus 'format' option is set to 'variable'.
my $data = [
[qw/ 1038 369 622 1731 1109 1274 517 2043 1106 201 665 593 1117 563 2448 2201 1036 2715 700 593 394 1097 212 /],
[qw/ 1348 1483 749 1658 401 952 1039 1488 791 1344 488 591 744 1472 1076 1475 784 1170 384 450 1035 938 1179 /],
[qw/ 4472 4388 2174 3527 5587 3454 2560 6247 2238 2778 4399 1750 4738 2918 6680 3141 3872 6634 2017 3458 1922 3374 2768 /],
[qw/ 2627 3407 2299 3094 2721 2705 2814 2804 2155 2500 2503 2701 3058 2914 2940 2596 2723 2710 3022 2557 2652 2920 2687 /],
[qw/ 6466 3596 153 3335 1921 3255 437 4486 2769 755 91 155 480 1954 5697 5327 1263 9577 52 268 68 2797 122 /],
[qw/ 2366 3984 300 837 1304 1909 3800 1994 2135 2089 5148 1956 1513 2160 1943 1918 2036 4800 1100 816 937 1327 918 /],
[qw/ 6862 5746 4220 5739 5646 4848 7089 5160 5514 6083 5187 4491 5154 6029 5870 4923 5287 5901 4055 4765 6213 3894 4694 /],
];
# Create Statistics::FactorAnalysis object with checking of variable distributions.
my $fac = Statistics::FactorAnalysis->new(dist_check => 1);
# Set compulsory format option - can be set in constructor as with any Moose attribute.
$fac->format('variable');
# Set compulsory LoL option. Points to reference of LoL of the data.
$fac->LoL($data);
# Load the data.
$fac->load_data;
# Loading complained so log transform data.
use Math::Cephes qw(:explog);
for my $row (@{$data}) { for my $col (@{$row}) { $col = log10($col); }}
# Re-load data.
$fac->load_data;
# We'll perform PCA to have a look at the PC variances so perform PCA analysis.
$fac->pca;
# Have a look at the variances.
$fac->pca_print_variance;
# If the first 2 PCs explain more than 75% of the variance we use 2 factors else we use 3.
my @cumulative = $fac->return_pca_cumulative_variances;
my $factors = $cumulative[1] > 0.75 ? 2 : 3;
# Set our choice of factor number.
$fac->factors($factors);
# We will compute the rotated matrix.
$fac->rotate(1);
# Perform the factor analysis.
$fac->fac();
# Have a look at the results - if you want to access data directly use the return methods (see DIRECT DATA ACCESS/RETURN METHODS).
$fac->fac_print_summary;
# Have a look at the results with the rotated loadings - can only call this method if the 'rotate' => 1.
$fac->fac_print_rotated_summary;
# create a reference containing a LoL of the loadings.
$fac->return_loadings;
DESCRIPTION
Factor analysis is a statistical method by which the variability of a large set of observed variables is described in terms of a smaller set of unobserved variables termed factors. Factor analysis uses the premise that data observed from such a large number of variables are in some way a function of these factors that cannot be measured directly. The observed variables are modeled as linear combinations of the factors. Factor analysis is related to principal component analysis (PCA). However, unlike PCA that takes into account all variability in the variables, factor analysis estimates how much of the variability is due to common factors ("communality"). See http://en.wikipedia.org/wiki/Factor_analysis.
METHODS
new
Object constructor. May pass arguments upon object construction - see OBJECT CONSTRUCTOR OPTIONS.
my $pca = Statistics::FactorAnalysis->new(dist_check => 1);
load_data
Used to load the data into the object. Requires you to set 'LoL' and 'format' options (can set these during object creation if you wish). LoL is a reference to a LoL containing the data. While, 'format' option specifies the nature of the LoL. If your data is in the format of a table (i.e. each nested reference corresponds to an observation) use 'table'. Thus in this case of 7 variables with 23 observations (of random data) we load as:
my $data = [
# Variables: 1, 2, 3, 4, 5, 6, 7,
[qw/ 1038 1348 4472 2627 6466 2366 6862 /], # obs1
[qw/ 369 1483 4388 3407 3596 3984 5746 /], # obs2
[qw/ 622 749 2174 2299 153 300 4220 /], # obs3
[qw/ 1731 1658 3527 3094 3335 837 5739 /], # ...
[qw/ 1109 401 5587 2721 1921 1304 5646 /],
[qw/ 1274 952 3454 2705 3255 1909 4848 /],
[qw/ 517 1039 2560 2814 437 3800 7089 /],
[qw/ 2043 1488 6247 2804 4486 1994 5160 /],
[qw/ 1106 791 2238 2155 2769 2135 5514 /],
[qw/ 201 1344 2778 2500 755 2089 6083 /],
[qw/ 665 488 4399 2503 91 5148 5187 /],
[qw/ 593 591 1750 2701 155 1956 4491 /],
[qw/ 1117 744 4738 3058 480 1513 5154 /],
[qw/ 563 1472 2918 2914 1954 2160 6029 /],
[qw/ 2448 1076 6680 2940 5697 1943 5870 /],
[qw/ 2201 1475 3141 2596 5327 1918 4923 /],
[qw/ 1036 784 3872 2723 1263 2036 5287 /],
[qw/ 2715 1170 6634 2710 9577 4800 5901 /],
[qw/ 700 384 2017 3022 52 1100 4055 /],
[qw/ 593 450 3458 2557 268 816 4765 /],
[qw/ 394 1035 1922 2652 68 937 6213 /],
[qw/ 1097 938 3374 2920 2797 1327 3894 /],
[qw/ 212 1179 2768 2687 122 918 4694 /], # obs23
];
$fac->format(q{table});
$fac->LoL($data);
$fac->load_data;
For the same sample of 7 variables with 23 observations if each nested LIST corresponds to a reference as below we use the 'variable' argument to the format option:
my $data = [ # obs 1, 2, 3, ..., 23
[qw/ 1038 369 622 1731 1109 1274 517 2043 1106 201 665 593 1117 563 2448 2201 1036 2715 700 593 394 1097 212 /],
[qw/ 1348 1483 749 1658 401 952 1039 1488 791 1344 488 591 744 1472 1076 1475 784 1170 384 450 1035 938 1179 /],
[qw/ 4472 4388 2174 3527 5587 3454 2560 6247 2238 2778 4399 1750 4738 2918 6680 3141 3872 6634 2017 3458 1922 3374 2768 /],
[qw/ 2627 3407 2299 3094 2721 2705 2814 2804 2155 2500 2503 2701 3058 2914 2940 2596 2723 2710 3022 2557 2652 2920 2687 /],
[qw/ 6466 3596 153 3335 1921 3255 437 4486 2769 755 91 155 480 1954 5697 5327 1263 9577 52 268 68 2797 122 /],
[qw/ 2366 3984 300 837 1304 1909 3800 1994 2135 2089 5148 1956 1513 2160 1943 1918 2036 4800 1100 816 937 1327 918 /],
[qw/ 6862 5746 4220 5739 5646 4848 7089 5160 5514 6083 5187 4491 5154 6029 5870 4923 5287 5901 4055 4765 6213 3894 4694 /],
];
$fac->format(q{variable});
$fac->LoL($data);
$fac->load_data;
PRINCIPAL COMPONENT ANALYSIS METHODS
This module performs PCA using the Statistics::PCA module. However, it introduced some additional options to give added flexibility e.g. standardise
, divisor
- see OPTIONS. Performing PCA analysis may be useful for making initial decisions about factor number to use.
pca
Performs optional PCA analysis.
pca_print_variance
Alias for original Statistics::PCA print_variance
method. Prints a table of PC standard deviations, proportion of variance and cumulative variance to STDOUT.
pca_print_eigenvectors
Alias for original Statistics::PCA print_eigenvectors
method. Prints a table of the individual eigenvectors to STDOUT .
pca_print_transform
Alias for original Statistics::PCA print_transform
method. Prints a table of the PCA transformed data to STDOUT.
pca_summary
Alias for original Statistics::PCA results
method. Prints summary of PCA analysis results to STDOUT.
FACTOR ANALYSIS METHODS
fac
Estimates parameters for factor model using the Principal Component Method.
fac_print_loadings
Prints a table to STDOUT of the loadings generated by fac
method.
fac_print_rotated_loadings
Prints a table to STDOUT of the rotated loadings generated by fac
method with rotation
option set to '1'.
fac_print_communalities
Prints a table to STDOUT of the communalities generated by fac
method.
fac_print_variance_explained
Prints a table to STDOUT of the variances explained by the individual factors generated by fac
method.
fac_print_summary
Prints a table to STDOUT summarising all data generated by fac
method.
fac_print_rotated_summary
Prints a table to STDOUT summarising all data generated by fac
method from rotated loadings.
DIRECT DATA ACCESS/RETURN METHODS
return_variable_number
Description: Returns total number of variables.
Usage: my $var_num = $fac->return_variable_number;
Return type: Number.
return_variable_measurements
Description: Returns the total number of observations.
Usage: my $obs_num = $fac->return_variable_measurements;
Return type: Number.
return_total_variance
Description: Returns sum of variances of analysed data.
Usage: my $variance = $fac->return_total_variance;
Return type: Number.
return_total_communality
Description: Returns the sum of the communalities of the analysed data.
Usage: my $communality = $fac->return_total_communality;
Return type: Number.
return_total_percentage_explained_by_factors
Description: Returns the total percentage of variance explained by the factors.
Usage: my $percentage = $fac->return_total_percentage_explained_by_factors;
Return type: Number.
return_variances
Description: Returns the variances of the analysed variables.
Usage: my @variances - $fac->return_variances_explained_by_factors;
Return type: LIST.
return_communalities
Description: Returns the individual communalities for the variables.
Usage: my @communalities = $fac->return_communalities;
Return type: LIST.
return_variances_explained_by_factors
Description: Returns the variance explained each of the factors for the loadings generated by the PC method.
Usage: my @variances_explained = $fac->return_variances_explained_by_rotated_factors;
Return type: LIST.
return_variances_explained_by_rotated_factors
Description: Returns the variance explained each of the factors for the rotated loadings generated by Varimax rotation of the original loadings.
Usage: my @@variances_explained = $fac->return_variances_explained_by_rotated_factors;
Return type: LIST.
return_percentages_explained_by_factors
Description: Returns the percentage of variance explained by the factors for each of observed variables.
Usage: my @percentage = $fac->return_percentages_explained_by_factors;
Return type: LIST.
return_pca_cumulative_variances
Description: Returns the cumulative variances for each successive Principal Component generated by a PCA analysis.
Usage: my @cumulative_variance = $fac->return_pca_cumulative_variances;
Return type: LIST.
return_orthogonal_matrix
Description: Returns a LoL of the orthogonal matrix generated by Varimax rotation.
Usage: for ($pca->return_orthogonal_matrix) { print @{$_}, qq{\n} }
Return type: LoL.
return_loadings
Description: Returns a LoL of the loadings generated by PC method factor analysis - each nested array contains the loadings for a single factor.
Usage: for ($pca->return_loadings) { print @{$_}, qq{\n} }
Return type: LoL.
return_rotated_loadings
Description: Returns a LoL of the rotated loadings generated by Varimax rotation - each nested array contains the rotated loadings for a single factor.
Usage: for ($pca->return_rotated_loadings) { print @{$_}, qq{\n} }
Return type: LoL.
OPTIONS
COMPULSORY DATA INPUT OPTIONS
format
Purpose: Defines format of LoL being passed to object. If the nested arrays contain the data of the different variables or of the different observations
use 'variable', or 'table' respectively. See METHODS.
Values: 'table', 'variable'.
Default value:
LoL
Purpose: Used for passing the data to the object. Accepts a reference to a LoL containing the data.
Values: Reference to LoL.
Default value:
OPTIONAL DATA CHECKS
dist_check
Purpose: Tells object whether to perform checks on the skewness and kurtosis of the data of the variables during the load_data method call. It prints
warnings to STDOUT if any variable deviates beyond acceptable cutoffs.
Values: '1', '0'.
Default value: '0'.
dist_croak
Purpose: Causes Statistics::FactorAnalysis to croak on load_data method calls instead of print to STDOUT if variables deviate beyond acceptable cutoffs
Values: '1', '0'.
Default value: '0'.
skewness
Purpose: Sets the cutoff value for skewness.
Values: Numeric.
Default value: 0.8.
kurtosis
Purpose: Sets the cutoff value for kurtosis.
Values: Numeric.
Default value: 3.
OPTIONAL DATA ANALYSIS OPTIONS.
standardise
Purpose: Used to tell the object whether to standardise the variables prior to subjecting them to the principal component method such that all have mean
zero and variance equal to one.
Values: 'Y', 'N'.
Default value: 'Y'.
factors
Purpose: Sets the number of factors to be used for the factor model.
Values: Numeric.
Default value: 3.
rotate
Purpose: Tells object whether to perform Varimax rotation of the PC generated loadings using the Statistics::PCA::Varimax module.
Values: 'Y', 'N'.
Default value: 'N'.
divisor
Purpose: Used to set the divisor for covariant matrix generation. To use N pass '0'. To use N-1 pass '-1'.
Values: '0', '-1'.
Default value: '0'.
eigen_method
Purpose: Used to define which module will be used to perform the eigen decomposition. To use Math::Cephes pass 'C'. For Math::MatrixReal pass 'M'. For the
gsl C library procedure implemented by Math::GSL::Linalg::SVD pass 'G'.
Values: 'M', 'C', 'G'.
Default value: 'M'.
TABLE PRINTING METHOD OPTIONS
cutoff
Purpose: Turns on cutoffs for printing loading values - if the loading value is below the cutoff value cutoff_null will be printed instead.
Values: '0', '-1'.
Default value: '0'.
cutoff_value
Purpose: Sets the cutoff value for printing loadings.
Values: Numeric.
Default value: 0.1
cutoff_null
Purpose: Sets the string to print in place of the loading if the loading is below the cutoff value.
Values: String.
Default value: ''.
DEPENDENCIES
'Carp' => '1.08', 'Moose' => '0.93', 'MooseX::NonMoose' => '0.07', 'Statistics::PCA' => '0.0.1', 'Statistics::PCA::Varimax' => '0.0.2', 'Math::GSL::Linalg::SVD' => '0.0.2', 'List::Util' => '1.22',
AUTHOR
Daniel S. T. Hughes <dsth@cantab.net>
SEE ALSO
Statistics::PCA, Statistics::PCA::Varimax,Math::GSL::Linalg::SVD.
BUGS
This software is in early stage of development. I´m sure there will be bugs.
LICENCE AND COPYRIGHT
Copyright (c) 2009, Daniel S. T. Hughes <dsth@cantab.net>
. All rights reserved.
This module is free software; you can redistribute it and/or modify it under the same terms as Perl itself. See perlartistic.
DISCLAIMER OF WARRANTY
because this software is licensed free of charge, there is no warranty for the software, to the extent permitted by applicable law. Except when otherwise stated in writing the copyright holders and/or other parties provide the software "as is" without warranty of any kind, either expressed or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. The entire risk as to the quality and performance of the software is with you. Should the software prove defective, you assume the cost of all necessary servicing, repair, or correction.
In no event unless required by applicable law or agreed to in writing will any copyright holder, or any other party who may modify and/or redistribute the software as permitted by the above licence, be liable to you for damages, including any general, special, incidental, or consequential damages arising out of the use or inability to use the software (including but not limited to loss of data or data being rendered inaccurate or losses sustained by you or third parties or a failure of the software to operate with any other software), even if such holder or other party has been advised of the possibility of such damages.
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 1251:
Non-ASCII character seen before =encoding in 'I´m'. Assuming UTF-8