NAME
Data::Maker - Simple, flexibile and extensible generation of realistic data
SYNOPSIS
An extremely basic example:
use Data::Maker;
my $maker = Data::Maker->new(
record_count => 10_000,
fields => [
{ name => 'phone', format => '(\d\d\d)\d\d\d-\d\d\d\d' }
]
);
while (my $record = $maker->next_record) {
print $record->phone . "\n";
}
A more complete example:
use Data::Maker;
use Data::Maker::Field::Person::LastName;
use Data::Maker::Field::Person::FirstName;
my $maker = Data::Maker->new(
record_count => 10_000,
delimiter => "\t",
fields => [
{
name => 'lastname',
class => 'Data::Maker::Field::Person::LastName'
},
{
name => 'firstname',
class => 'Data::Maker::Field::Person::FirstName'
},
{
name => 'phone',
class => 'Data::Maker::Field::Format',
args => {
format => '(\d\d\d)\d\d\d-\d\d\d\d',
}
},
]
);
while (my $record = $maker->next_record) {
print $record->delimited . "\n";
}
DESCRIPTION
Whatever kind of test or demonstration data you need, Data::Maker will help you make lots of it.
And if you happen to need one of the various types of data that is available as predefined field types, it will be even easier.
CONSTRUCTOR
- new PARAMS
-
Returns a new Data::Maker object. Any PARAMS passed to the constructor will be set as properties of the object.
OBJECT METHODS
- BUILD
-
The BUILD method is a Moose thing. It is run immediately after the object is created. Currently used in Data::Maker only to seed the randomness, if a seed was provided.
- field_by_name
-
Given the name of a field, this method returns the Field object
- next_record
-
This method not only gets the next record, but it also triggers the generation of the data itself.
- new_or_cached
-
This method is not used yet, though I keep hoping the object_cache() code above (in next_record ) will call this method instead of having the code there. But it is really only used once in this form, so I'm perhaps being too picky.
- in_progress NAME
-
This method is used to get the already-generated value of a field in the list, before the entire record has been created and blessed as a Record object. This was created for, and is mostly useful for, fields that depend upon the values of other fields. For example, the Data::Maker::Field::Person::Gender class uses this, so that the gender of the person will match the first name of the person.
- header
-
Prints out a delimited list of all of the labels, only if a delimiter was provided to the Data::Maker object
ATTRIBUTES
The following Moose attributes are used (the data type of each attribute is also listed):
- fields (ArrayRef[HashRef])
-
A list of hashrefs, each of which describes one field to be generated. Each field needs to define the subclass of Data::Maker::Field that is used to generate that field. The order of the fields has some relevance, particularly in the context of Data::Maker::Record. For example, the delimited method returns the fields in the order in which they are listed here.
Note: It may make more sense in the future for each field to have a "sequence" attribute, so methods such as delimited would return then in a different order than that in which they are generated. The order in which fields are generated matters in the event that one field relies on data from another (for example, the Data::Maker::Field::Person::Gender field class relies on a first name that must have already been generated).
Data::Maker::Field::Code - Use a code reference to generate the data. This is useful for generating a value for a field that is based on the value of another field.
Data::Maker::Field::DateTime - Generates a random DateTime, using DateTime::Event::Random.
Data::Maker::Field::File - Provide your own file of seed data.
Data::Maker::Field::Format - Specify a format for the data to follow. The follow regexp-inspired atoms are supported:
\d: Digit \w: Word character \W: Word character, with all letters uppercase \l: Letter \L: Uppercase letter \x: hex character (00, f2, 97, b4, etc) \X: Uppercase hex character (00, F2, 97, B4, etc)
Data::Maker::Field::Person::FirstName - A built-in field class for generating (mostly Anglo) first (given) names.
Data::Maker::Field::Person::MiddleName - A built-in field class for generating middle initials (I realize it's called MiddleName). It should eventually be able to generate middle names or initials.
Data::Maker::Field::Person::LastName - A built-in field class for generating (mostly Anglo) surnames.
Data::Maker::Field::Person::Gender - Given a field that represents a given name, this class uses Text::GenderFromName to guess the gender (currently returning only "M" or "F"). If it is not able to guess the gender, it returns "U" (unknown).
Data::Maker::Field::Person::SSN - A simple example of class that can be added to meet your own needs. This class uses Data::Maker::Field::Format to create a formatted string of random digits.
- record_count (Num)
-
The number of records desired
- object_cache (HashRef)
-
Used internally by Data::Maker to ensure reuse of the field objects for each row. This is important because certain objects have large data sets inside them.
- data_sources (HashRef)
-
Used internally by Data::Maker. It's a hashref to store open file handles.
- record_counts (HashRef)
-
A hashref of record counts. Not sure why this was used. It's mentioned in Data::Maker::Field::File
- delimiter
-
The optional delimiter... could be anything. Usually a comma, tab, pipe, etc
- generated (Num)
-
Returns the number of records that have been generated so far.
- seed (Num)
-
The optional random seed. Provide a seed to ensure that the randomly-generated data comes out the same each time you run it.
CONTRIBUTORS
Thanks to my employer, Informatics Corporation of America, for its commitment to Perl and to giving back to the Perl community. Thanks to Mark Frost for the idea about optionally seeding the randomness to ensure the same output each time a program is run, if that's what you want to do.
AUTHOR
John Ingram (john@funnycow.com)
LICENSE
Copyright 2010 by John Ingram. All rights reserved. This program is free software; you can redistribute it and/or modify it under the same terms as Perl itself.