NAME
Data::Type - robust and extensible data- and valuetype system
SYNOPSIS
use Data::Type qw(:is +ALL);
is STD::EMAIL or warn;
warn if isnt STD::CREDITCARD( 'MASTERCARD', 'VISA' );
try
{
valid( '9999-12-31 23:59:59', DB::DATETIME );
}
catch Data::Type::Exception with
{
print $e->to_string foreach @_;
};
DESCRIPTION
A lot of CPAN modules have a common purpose: reporting if data has some "characteristics". Email::Valid is an illustrous example: reporting if a string has characteristics of an email address. The address()
method reports this via returning 'yes'
or 'no'
. Another module, another behaviour: Business::ISSN
tests for the characteristics of an International Standard Serial Number
and does this via an is_valid
method returning true
or false
. And so on and so on. Data::Type was created with modularity, introspectability and usability in mind.
The resulting key concepts are:
a unified interface to type related CPAN modules (via Data::Type)
generic, fun to extend and simple API (see Data::Type::Docs::RFC)
paremeterized types ( eg.
STD::VARCHAR(80)
)alternativly exception-based or functional problem reports (valid() contra is())
localization via Locale::Maketext ("Localization" in Data::Type)
syntactic sugar (
die unless is BIO::DNA
)generic access through DBI to catalog of data types and more (see Data::Type::Query)
This module relies, as much as its plausible, on CPAN modules doing the job in the backend. For instance Regexp::Common is doing a lot of the regular expression testing. Email::Valid takes care of the EMAIL
type. Data::Parse can be exploited for doing the backwork for the DATE
type.
DOCUMENTATION
You find a gentle introduction at Data::Type::Docs. It also navigates you through the rest of the documentation. Advanced users should keep on reading here.
SUPPORTED TYPES
All types are grouped and though belong to a collection. The collection is identified by a short id. All members are living in a namespace that is prefixed with it (uppercased).
- Standard Collection ('STD')
-
This is a heterogenous collection of datatypes which is loaded by default. It contains various issues from CPAN modules (i.e. business, creditcard, email, markup, regexps and etc.) and some everyday things. See Data::Type::Collection::Std.
- W3C/XML-Schema Collection ('W3C')
-
A nearly 1-to-1 use of XML::Schema datatypes. It is nearly complete and works off the shelf. Please visit the XMLSchema http://www.w3.org/TR/xmlschema-2/ homepage for sophisticated documentation. See Data::Type::Collection::W3C.
- Database Collection ('DB')
-
Common database table types (VARCHAR, TINYTEXT, TIMESTAMP, etc.). See Data::Type::Collection::DB.
- Biological Collection ('BIO')
-
Everything that is related to biological matters (DNA, RNA, etc.). See Data::Type::Collection::Bio.
- Chemistry Collection ('CHEM')
-
Everything that is related to chemical matters (Atoms, etc.). See Data::Type::Collection::Chem.
- Perl5 Collection ('PERL')
-
Reserved and undecided. See Data::Type::Collection::Perl.
- Perl6 Apocalypse Collection ('PERL6')
-
Placeholder for the Apocalypse and Synopsis 6 suggested datatypes for perl6. See Data::Type::Collection::Perl6.
[Note] ALL
is a an alias for all available collections at once.
[NOTE] Please consider the same constrains as for CPAN namespaces when using/suggesting a new ID. A short discussion on the http://sf.net/projects/datatype mailinglist is rewarded with gratefullness and respect.
API
FUNCTIONS
valid( $value, @types )
This function throws a Data::Type::Exception exception on failure.
Verifies a 'value' against (one ore more) types or facets.
try
{
valid( 'muenalan<haaar..harr>cpan.org', STD::EMAIL );
}
catch Data::Type::Exception with
{
dump( $e ) foreach @_;
};
is( $type )
$scalar = is( $value, $type );
$scalar = is( $type ); # $_ is used as $value
Returns true or false instead of throwing exceptions. This is for the exception haters. For reporting, the exceptions are stored in $Data::Type::err
aref.
is( 'muenalan<haaar..harr>cpan.org', STD::EMAIL ) or die dump($Data::Type::err);
[Note] dump()
is part of Data::Dump. You can use any dumping routine or format a string with printf, of course.
If first argument is a $dt
it uses $_
instead of $value
. This is for syntactic sugar like:
foreach( @nucleotide_samples )
{
email_to( $SETI ) unless is BIO::DNA; # Sends "Non terrestric genome found. Suspected sequence '$_'.
}
[Note] Dont take that example to serious. It also could have been simple RNA. Better would have been unless is (BIO::DNA, BIO::RNA)
.
isnt( $type )
$scalar = isnt( $value, $type );
$scalar = isnt( $type ); # $_ is used as $value
A negation of "is( $type )", or better an idiom for "not is". These are all semantical identical constructs:
die if isnt STD::EMAIL;
die if not is STD::EMAIL;
die unless is STD::EMAIL;
[Note] die if is not STD::EMAIL
would be wrong (even if it is the most natural form). STD::EMAIL is not a package, but the FUNCTION STD::EMAIL() function. So a less ambigous form would be
die unless is STD::EMAIL();
because it cautions one not to confuse package vs. function names.
summary( $value, @types )
$scalar = summary( $value, @types );
@entries = summary( $value, @types ); # list context
In scalar context returns the textual representation of the facet set. Gives you a clou how the type verification process is driven. You can use that to prompt the web user to correct invalid form fields.
print summary( $cc , STD::CREDITCARD( 'VISA' ) );
[Note] A real $dt->test
is employed to collect the required information. Therefore the $value
arguement is required, because it dictates the executed code.
In list context summary
returns an array of Data::Type::Entry objects.
print $_->expected for summary( $cc , STD::CREDITCARD( 'VISA' ) );
CLASS METHODS
The method interface is thoroughly described in Data::Type::Docs::RFC.
Data::Type->set_locale( 'id' )
If there is an implemented locale package under Data::Type::L18N::<id>, then you can switch to that language with this method. Only text that may be promted to an end user are seriously exposed to localization. Developers must live with english.
[Note] Visit the "LOCALIZATION" section below for extensive information.
LOCALIZATION
All localization is done via Locale::Maketext. The package Data::Type::L18N is the base class, while Data::Type::L18N::<id> is a concrete implementation.
LOCALES
$Data::Type::L18N::de
German. Not very complete.
$Data::Type::L18N::eng
Complete English dictionary.
And to set to your favorite locale during runtime use the set_locale
method of Data::Type (Of course the locale must be implemented).
use Data::Type qw(:all +DB);
Data::Type->set_locale( 'de' ); # set to german texts
...
Visit the "LOCALIZATION" in Data::Type::Docs::Howto section for more on adding your own language.
[Note] Localization is only used for texts which somehow will be prompted to the user vis the summary()
functions or an exception. This should help developing, for example, web applications with Data::Type and you simply forward problems to the user in the correct language.
EXPORT
No Functions, but the STD collection is imported per default.
FUNCTIONS
is
, isnt
, valid
, dvalid
, catalog
, toc
, summary
, try
and with
.
Exporter sets are:
':all' [qw(is isnt valid dvalid catalog toc summary try with)]
':valid' or ':is' [qw(is isnt valid dvalid)]
':try' [qw(try with)]
DATATYPES
You can control the datatypes to be exported with following parameter.
+<uppercased collection id> (i.e. BIO, DB, ... )
The STD is loaded everytime (And you cannot unload it currently). Currently following collections are available DB, BIO, PERL, PERL6 (see above). The special collection ALL is a synonym for all available collections.
Example:
use Data::Type qw(:all +BIO); # ..export the BIO collection
use Data::Type qw(:all +DB); # ..the DB collection
use Data::Type qw(:all +ALL); # ..and all available collections
[Note] Data::Type pollutes namespaces en mass, but mitigates this via subjecting only to UPPERCASED namespaces. These are generally reserved and therefore hopefully not often used. If one has conflicts with legacy code use export options below.
OPTIONS
MASTER PREFIX
With this option you change the default datatypes alias's. If you use this option all alias's are prefixed with that string. The option is identified by a starting "<"
and ending ">"
. One should care not to produce invalid package/function name constructs (spaces etc.). So if you want stop namespace pollution and want that all datatypes are send to a single namespace (eg. <"any::">
) invoke Data::Type like this:
use Data::Type qw(:all <dt::> +BIO +DB);
die unless is dt::STD::EMAIL;
so all later code accessing datatypes should use this prefix. It doesnt need to be a namespace, and <"__">
would be absolutely valid (because the alias's are created via a string fed to "eval" in perlfunc. So thats valid:
use Data::Type qw(:all <__>);
die unless is __STD::EMAIL;
[Note] Generally all datatypes are dispatched via an "AUTOLOAD" in perlfunc routine in the Data::Type::Proxy namespace. Via runtime codegeneration an alias subroutine is created to hop the the original call.
sub DB::ENUM { Data::Type::Proxy::db_enum( @_ ) };
In this example any use of DB::ENUM gets redirected to Data::Type::Object::db_enum interface (dont call it directly!).
UNDERSCORE
A single occurance of _
within the import parameters will activatve UNDERSCORE namespace resolution. That is, instead of using the COLLECTION::TYPE:: theme for the datatypes the '::
' part is replaced with an '_
' (underscore). In terms of namespace pollution a sterile solution.
So you want everything within Data::Type::
:
use Data::Type qw(:all _ <Data::Type::> +ALL);
die unless is Data::Type::STD_EMAIL(); # default was STD::ENUM
Unless a MASTER_PREFIX is defined, UNDERSCORE will export the types into the caller package:
use Data::Type qw(:all _ +ALL);
die unless is STD_EMAIL(); # default was STD::ENUM
If MASTER_PREFIX is defined, UNDERSCORE will export the types into Data::Type::
. This can be somewhat confusing. Use explicit package names within the MASTER_PREFIX to circumvent this ambiguous style.
package main;
use Data::Type qw(:all _ <main::TYPE_> +ALL);
die unless is TYPE_STD_EMAIL(); # default was STD::ENUM
If i handn't introduced main::
in the MASTER_PREFIX i have exported types into Data::Type::
, remembers:
use Data::Type qw(:all _ <TYPE_> +ALL);
die unless is Data::Type::TYPE_STD_EMAIL(); # default was STD::ENUM
DEBUG
Will increase debuglevel one up. Place multiple times for increased verbosity.
use Data::Type qw(:all DEBUG++ DEBUG++);
would yield to debuglevel 2. To decrease debuglevel one level:
use Data::Type qw(:all DEBUG++ +BIO DEBUG--);
would turn debuglevel up during import process of the BIO collection and then back to default.
PREREQUISITES
General
Class::Maker (0.05.17), Regexp::Box (0.01), Error (0.15), IO::Extended (0.06), Tie::ListKeyedHash (0.41), Data::Iter (0), Class::Multimethods (1.70), Attribute::Util (0.01), DBI (1.30), Text::TabularDisplay (1.18), String::ExpandEscapes (0.01), XML::LibXSLT (1.53)
Additionally required
The following modules are eval'ed at runtime if required. Data::Type delays the loading of them until a datatype is actually using it. This has some (more) pro and cons. May be somebody could realize a small "delay" first time using a datatype.
If you install this module via CPAN, all modules below are also required and should be installed if you have setup CPAN correctly. Even if you never intend to use some of the datatypes they are strictly required. But this shouldnt hurt too much.
- Locale::Language (2.21)
- Business::CreditCard (0.27)
- Email::Valid (0.15)
- Business::UPC (0.04)
- HTML::Lint (1.26)
- Business::CINS (1.13)
- Date::Parse (2.27)
- Net::IPv6Addr (0.2)
- Business::ISSN (0.90)
- Regexp::Common (2.113)
- X500::DN (0.28)
- Locale::SubCountry (0)
- XML::Schema (0.07)
-
- by W3C::ANYURI, W3C::BASE64BINARY, W3C::BOOLEAN, W3C::BYTE, W3C::DATE, W3C::DATETIME, W3C::DECIMAL, W3C::DOUBLE, W3C::DURATION, W3C::ENTITIES, W3C::ENTITY, W3C::FLOAT, W3C::GDAY, W3C::GMONTH, W3C::GMONTHDAY, W3C::GYEAR, W3C::GYEARMONTH, W3C::HEXBINARY, W3C::ID, W3C::IDREF, W3C::IDREFS, W3C::INT, W3C::INTEGER, W3C::LANGUAGE, W3C::LONG, W3C::NAME, W3C::NCNAME, W3C::NEGATIVEINTEGER, W3C::NMTOKEN, W3C::NMTOKENS, W3C::NONNEGATIVEINTEGER, W3C::NONPOSITIVEINTEGER, W3C::NORMALIZEDSTRING, W3C::NOTATION, W3C::POSITIVEINTEGER, W3C::QNAME, W3C::SHORT, W3C::STRING, W3C::TIME, W3C::TOKEN, W3C::UNSIGNEDBYTE, W3C::UNSIGNEDINT, W3C::UNSIGNEDLONG, W3C::UNSIGNEDSHORT
- XML::Parser (2.34)
- Pod::Find (0.24)
EXAMPLES
You can find typical uses in Data::Type::Docs::Howto and some scripts may reside in t/ and contrib/ of this distribution.
CONTACT
Sourceforge http://sf.net/projects/datatype is hosting a project dedicated to this module. And I enjoy receiving your comments/suggestion/reports also via http://rt.cpan.org or http://testers.cpan.org.
AUTHOR
Murat Uenalan, <muenalan@cpan.org>
SEE ALSO
All the basic are described at Data::Type::Docs. It also navigates you through the rest of the documentation.
Data::Type::Docs::FAQ, Data::Type::Docs::FOP, Data::Type::Docs::Howto, Data::Type::Docs::RFC, Data::Type::Facet, Data::Type::Filter, Data::Type::Query, Data::Type::Collection::Std
And these CPAN modules:
Data::Types, String::Checker, Regexp::Common, Data::FormValidator, HTML::FormValidator, CGI::FormMagick::Validator, CGI::Validate, Email::Valid::Loose, Embperl::Form::Validate, Attribute::Types, String::Pattern, Class::Tangram, WWW::Form
W3C XML Schema datatypes
http://www.w3.org/TR/xmlschema-2/
Synopsis 6 by Damian Conway, Allison Randal
http://www.perl.com/pub/a/2003/04/09/synopsis.html?page=3
1 POD Error
The following errors were encountered while parsing the POD:
- Around line 975:
alternative text 'W3C/XML-Schema Collection ('W3C')' contains non-escaped | or /