NAME
Algorithm::LibLinear::DataSet
SYNOPSIS
use Algorithm::LibLinear::DataSet;
use Algorithm::LibLinear::ScalingParameter;
my $data_set = Algorithm::LibLinear::DataSet->new(data_set => [
+{ feature => +{ 1 => 0.708333, 2 => 1, 3 => 1, ... }, label => 1, },
+{ feature => +{ 1 => 0.583333, 2 => -1, 3 => 0.333333, ... }, label => -1, },
+{ feature => +{ 1 => 0.166667, 2 => 1, 3 => -0.333333, ... }, label => 1, },
...
]);
my $data_set = Algorithm::LibLinear::DataSet->load(fh => \*DATA);
my $data_set = Algorithm::LibLinear::DataSet->load(filename => 'liblinear_file');
my $data_set = Algorithm::LibLinear::DataSet->load(string => "+1 1:0.70833 ...");
say $data_set->size;
my $scaled_data_set = $data_set->scale(parameter => Algorithm::LibLinaer::ScalingParameter->new(...));
say $data_set->as_string; # '+1 1:0.70833 2:1 3:1 ...'
__DATA__
+1 1:0.708333 2:1 3:1 4:-0.320755 5:-0.105023 6:-1 7:1 8:-0.419847 9:-1 10:-0.225806 12:1 13:-1
-1 1:0.583333 2:-1 3:0.333333 4:-0.603774 5:1 6:-1 7:1 8:0.358779 9:-1 10:-0.483871 12:-1 13:1
+1 1:0.166667 2:1 3:-0.333333 4:-0.433962 5:-0.383562 6:-1 7:-1 8:0.0687023 9:-1 10:-0.903226 11:-1 12:-1 13:1
...
DESCRIPTION
This class represents set of feature vectors with gold answers.
METHODS
new(data_set => \@data_set)
Constructor.
data_set
is an ArrayRef of HashRef that has 2 keys: feature
and label
. The value of feature
is a HashRef which represents a (sparse) feature vector. Its key is an index and corresponding value is a real number. The indices must be >= 1. The value of label
is an integer that is class label the feature belonging.
load([fh => \*FH] [, filename => $path] [, string => $string])
Class method. Loads data set from LIBSVM/LIBLINEAR format file.
as_string
Dumps the data set as a LIBSVM/LIBLINEAR format data.
scale(parameter => $scaling_parameter)
Returns a scaled data set. parameter
is an instance of Algorithm::LibLinear::ScalingParameter.
After scaling, each feature value in data set will be within [$scaling_parameter->lower_bound
, $scaling_parameter->upper_bound
].
size
The number of data.