NAME
Sah::FAQ - Frequently asked questions
VERSION
This document describes version 0.9.51 of Sah::FAQ (from Perl distribution Sah), released on 2022-10-21.
GENERAL
General
Why use a schema (a.k.a "Turing tarpit")? Why not use pure Perl?
Schema language is a specialized language (DSL) that should be more concise to write than equivalent Perl code for common validation tasks. Its goal is never to be as powerful as Perl.
90% of the time, my schemas are some variations of the simple cases like:
"str*"
["str": {"len_between": [1, 10], "match": "some regex"}]
["str": {"in": ["a", "b", "c", ...]}]
["array": {"of": "some_other_type"}]
["hash": {"keys": {"key1": "some schema", ...}, "req_keys": [...], ...}]
and writing schemas is faster and less tedious/error-prone than writing equivalent Perl code, plus Data::Sah can generate JavaScript code and human description text for me. For more complex validation I stay with Sah until it starts to get unwieldy. It usually can go pretty far since I can add functions and custom clauses to its types; it's for the very complex and dynamic validation needs that I go pure Perl. Your mileage may vary.
What does "Sah" mean?
Sah is an Indonesian word, meaning "valid" or "legal". It's picked because it's short.
The previous incarnation of this module uses the namespace Data::Schema, started in 2009 and deprecated in 2011 in favor of "Sah".
Comparison to other schema languages and type systems
Comparison to JSON schema?
JSON schema limits its type system to that supported by JSON/JavaScript.
JSON schema's syntax is simpler.
Its metaschema (schema for the schema) is only about 130 lines. There are no shortcut forms.
JSON schema's features are more limited.
No expression, no function.
Comparison to Data::Rx?
TBD
Comparison to Data::FormValidator (DFV)?
TBD
Comparison to Moose types?
TBD
SYNTAX
General
Why is req
not enabled the default?
I am following SQL's behavior. A type declaration like:
INT
in SQL means NULL
is allowed, while:
INT NOT NULL
means NULL
is not allowed. The above is equivalent to specifying this in Sah:
int*
One could argue that setting req
to 1 by default is safer/more convenient to her/whatever, and int
should mean ["int", "req", 1]
while something like perhaps int?
means ["int", "req", 0]
. But this is simply a design choice and each has its pros/cons. Nullable by default can also be convenient in some cases, like when specifying program options where most of the options are optional.
How about adding a default_req
configuration in Data::Sah
then?
In general I am against compiler configuration which changes language behavior (think PHP's register_globals
or <magic_quotes_*> settings). In this case, it makes a simple schema like int
to have ambiguous meaning (is undefined value allowed? Or not allowed? It depends on compiler configuration).
Why int
instead of integer
? Why req
instead of required
? str
instead of string
? Etc.
This is also a design choice. To be consistent, either we abbreviate or we don't. Although there is very little reason to abbreviate when it comes to disk/memory size (compared to the eras of early Unix or C language), there are other limited resources to consider: source code column width (usually still around 80 characters in many best practices) and developer's time/energy (typing more takes more time and effort).
I want to make it possible for short schemas to be specified on a single line. For example compare:
[integer => {required => 1, minimum => 0, maximum => 100, divisible_by => 2}]
versus:
[int => {req=>1, min=>0, max=>100, div_by=>2}]
The latter is not that much less readable than the first, but is less tedious to type, especially if you write lots of schemas.
Therefore, the decision is to use commonly used (and unambiguous) abbreviations for type and clause names.
How to express "not-something"? Why isn't there a not
or not_in
clause?
There are generally no not_CLAUSE
clauses. Instead, a generic !CLAUSE
syntax is provided. Examples:
// an integer that is not 0
["int", {"!is": 0}]
// a username that is not one of the forbidden/reserved ones
["str", {"!in": ["root", "admin", "superuser"]}]
How to state in
as well as !in
in the same clause set?
You can't do this since it will cause a conflict:
["str ", {"in": ["a","b","c"], "!in": ["x","y","z"]}]
However, you can do this:
["str ", {"clset&": [{"in": ["a","b","c"]}, {"!in": ["x","y","z"]}]}]
How to express mutual failure ("if A fails, B must also fail")?
You can use if
clause and negate the clauses. For example:
"if": [{"!div_by": 2}, {"!div_by": 5}]
How about "len_in" clause for str? Or "values_uniq" for hash? Or perhaps "len_div_by"? Or some other clauses that test a property/transform of a value?
Except for some commonly used cases like len_between
, min_len
, max_len
, allowed_keys
, forbidden_keys
, to validate a certain property of the value (instead of the raw value itself), you can use the generic prop
clause:
// check hash values are unique
["hash", {"prop": ["values", ["array", {"uniq":1}]]}]
General advice when writing schemas?
Avoid
any
orall
if you know that data is of a certain typeFor performance and ease of reflection, it is better to create a custom clause than using the
any
type, especially with long list of alternatives. An example:// dns_record is either a_record, mx_record, ns_record, cname_record, ... ["any", "of", [ "a_record", "mx_record", "ns_record", "cname_record", ... ] ] // base_record ["hash", "keys", { "owner": "str*", "ttl": "int", }] // a_record ["base_record", "merge.normal.keys", { "type": ["str*", "is", "A"], "address": "str*" }] // mx_record ["base_record", "merge.normal.keys", { "type": ["str*", "is", "MX"], "host": "str*", "prio": "int" }] ...
If you see the declaration above, every record is a hash. So it is better to declare
dns_record
as ahash
instead of anany
. But we need to select a different schema based on thetype
key. We can develop a custom clause like this:["hash", "select_schema_on_key", ["type", { "A": "a_record", "MX": "mx_record", "NS": "ns_record", "CNAME": "cname_record", ... }]]
This will be faster.
Hash
How does Sah check allowed/unallowed keys?
If keys
clause is specified, then by default only keys defined in keys
clause is allowed, unless the .restrict
attribute is set to false, in which case no restriction to allowed keys is done by the clause. The same case for re_keys
.
If allowed_keys
and/or allowed_keys_re
clause is specified, then only keys matching those clauses are allowed. This is in addition to restriction placed by other clauses, of course.
How do I specify schemas for some keys, but still allow some other keys?
Set the .restrict
attribute for keys
or re_keys
to false. Example:
["hash", {
"keys": {"a": "int", "b": "int"},
"keys.restrict": 0,
"allowed_keys": ["a", "b", "c", "d", "e"]
]
The above schema allows keys a, b, c, d, e
and specifies values for a, b
. Another example:
["hash", {
"keys": {"a": "int", "b": "int"},
"keys.restrict": 0,
"allowed_keys_re": "^[ab_]",
]
The above schema specifies values for a, b
but still allows other keys beginning with an underscore.
What is the difference between the keys
and req_keys
clauses?
req_keys
require keys to exist, but their values are governed by the schemas in keys
or keys_re
. Here are four combination possibilities, each with the schema:
To require a hash key to exist, but its value can be undef:
["hash", "keys", {"a": "int"}, "req_keys": ["a"]]
To allow a hash key to not exist, but when it exists it must not be undef:
["hash", "keys", {"a": "int*"}]
To allow a hash key to not exist, or its value to be undef when exists:
["hash", "keys", {"a": "int"}]
To require hash key exist and its value must not be undef:
["hash", "keys", {"a": "int*"}, "req_keys": ["a"]]
Merging and hash keys?
XXX (Turn off hash merging using the ''
Data::ModeMerge options key.
HOMEPAGE
Please visit the project's homepage at https://metacpan.org/release/Sah.
SOURCE
Source repository is at https://github.com/perlancar/perl-Sah.
AUTHOR
perlancar <perlancar@cpan.org>
CONTRIBUTING
To contribute, you can send patches by email/via RT, or send pull requests on GitHub.
Most of the time, you don't need to build the distribution yourself. You can simply modify the code, then test via:
% prove -l
If you want to build the distribution (e.g. to try to install it locally on your system), you can install Dist::Zilla, Dist::Zilla::PluginBundle::Author::PERLANCAR, Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps required beyond that are considered a bug and can be reported to me.
COPYRIGHT AND LICENSE
This software is copyright (c) 2022, 2020, 2019, 2017, 2016, 2015, 2014, 2013, 2012 by perlancar <perlancar@cpan.org>.
This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.
BUGS
Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=Sah
When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.