NAME
dvp_gen_parser - creates in-memory perl data validator
SYNOPSIS
$ dvp_gen_parser -m MyParser data.yml.spec
options:
-m|module : parser module name
-v|verbose: print grammar created
-d|debug : print debug message
DESCRIPTION
This is the command-line utility of Data::Validate::Perl. It reads in the data structure specification and creates the parser module directly. For most users this is only thing to run in order to facilitate the module behind.
DATA SPECIFICATION FORMAT
Each data item here has a symbolic prefix, pretty much like in perl:
1. scalar: $
2. array: @
3. hash: %
4. value: '
- SCALAR
-
# the value of scalar 'foo' can be any string $foo: TEXT # the value of scalar 'foo' must be 'bar' $foo: 'bar # the value of scalar 'foo' can be either 'bar1' or 'bar2' # note there is only one single-quote at begin $foo: 'bar1 'bar2
- ARRAY
-
# the array 'foo' may contain a scalar called 'bar' # the possible values of 'bar' can be defined like above @foo: $bar # the array 'foo' may contain another array called 'bar' # the structure of 'bar' needs to be defined somewhere @foo: @bar # the array 'foo' may contain another 2 arrays 'bar1' and 'bar2' @foo: @bar1 @bar2 # the array 'foo' may contain 1 array called 'bar1' and 1 hash # called 'bar2' @foo: @bar1 %bar2 # the array 'foo' can be a simple array made up by scalars too @foo: 'bar1 'bar2 'bar3
Generic array rules: 1. each declared array item may or may NOT appear (optional), except for the case of single item; 2. relationship between multiple declared array items is OR; 3. when array is declared, item(s) not in the declaration will be rejected; 4. array referred by other data structure might NOT be declared, it will be treated as simple array which contains any number of scalar values (anonymous array);
- HASH
-
# the hash 'foo' may contain a key 'bar', whose value is 'zoo' %foo: $bar $bar: 'zoo # the hash 'foo' may contain a key 'bar', whose value is array %foo: @bar # the hash 'foo' may contain a key 'bar', whose value is another # hash %foo: %bar # the hash 'foo' may contain 2 keys: 'bar1' whose value is array, # 'bar2' whose value is hash %foo: @bar1 %bar2 # the hash 'foo' may contain 3 keys: 'bar1' whose value is array, # 'bar2' whose value is hash, 'bar3' whose value is scalar %foo: @bar1 %bar2 $bar3
Generic hash rules: 1. each declared hash key may or may NOT appear (optional), except for the case of single key; 2. relationship between multiple declared keys is OR; 3. when hash is declared, key(s) not in the declaration will be rejected; 4. hash referred by other data structure might NOT be declared, it will be treated as simple hash which contains any number of scalar key/value pairs (anonymous hash);
USECASE
Requirement
Say you have a program which reads all the postcode and areacode of all China mainland cities from a yaml file. Before taking the information inside you want to make sure this file looks like the one with proper data.
The sample format of this yaml file is like below: --- - city: Beijing postcode: 100000 areacode: 010 - city: Shanghai postcode: 200000 areacode: 020 - city: Hangzhou postcode: 310000 areacode: 0571 - city: Kunming postcode: 650000 areacode: 0871 ...
STEP 1: Write the Specification
The structure is not complicated, top container is an array, each item inside is hash with 3 keys. The corresponding specification can be written as:
@root: %record
%record: $city $postcode $areacode
Which is good enough to walk through the data. If you have the full list of city names that may appear, one more line can be inserted to list them:
$city: 'Beijing 'Shanghai 'Hangzhou 'Kunming ...
Save these lines into a file called city_yaml.spec
STEP 2: Create the Parser Module
Simply call the command-line tool dvp_gen_parser
like this:
$ dvp_gen_parser -m City::Record::Validator city_yaml.spec
This will create Validator.pm under current working directory, move it to the right folder your main script will use later.
STEP 3: Put Together
Final step, write a simple script to load yaml and call the module method to validate data inside:
# demonstration purpose only
use FindBin;
use YAML::XS qw/Load/;
use lib "$FindBin::RealBin/../lib/perl5";
use City::Record::Validator;
open my $F, '<', $ARGV[0] or croak "cannot open file to read: $!";
my $cont = do { local $/; <$F> };
close $F;
my $data = Load($cont);
my $validator = City::Record::Validator::->new();
$validator->parse($data);
print STDERR '#error = ', $parser->YYNBerr(), "\n";
That is it.
Conclusion
The advantage of this module is to process data in memory, this will not bound it with any specific file format. As long as perl understands the target format and is capable of loading it, the created module will check the data in memory.
TROUBLESHOOTING
Parse::Yapp provides a debug flag to control the degree of running information of state machine. This flag is controlled by an environment variable named YYDEBUG
and defaults off.
$ YYDEBUG=31 validate.pl data.yml
BUGS
1. no support of required hash key(s);
2. hash key namespace clash;
LICENSE AND COPYRIGHT
Copyright 2014 Dongxu Ma.
This program is free software; you can redistribute it and/or modify it under the terms of the the Artistic License (2.0). You may obtain a copy of the full license at: