NAME

Data::Sah::Compiler::Prog - Base class for programming language compilers

VERSION

This document describes version 0.917 of Data::Sah::Compiler::Prog (from Perl distribution Data-Sah), released on 2024-02-16.

SYNOPSIS

DESCRIPTION

This class is derived from Data::Sah::Compiler. It is used as base class for compilers which compile schemas into code (validator) in several programming languages, Perl (Data::Sah::Compiler::perl) and JavaScript (Data::Sah::Compiler::js) being two of them. (Other similar programming languages like PHP and Ruby might also be supported later on if needed).

Compilers using this base class are flexible in the kind of code they produce:

  • configurable validator return type

    Can generate validator that returns a simple bool result, str, or full data structure (containing errors, warnings, and potentially other information).

  • configurable data term

    For flexibility in combining the validator code with other code, e.g. putting inside subroutine wrapper (see Perinci::Sub::Wrapper) or directly embedded to your source code (see Dist::Zilla::Plugin::Rinci::Validate).

HOW IT WORKS

The compiler generates code in the following form:

EXPR && EXPR2 && ...

where EXPR can be a single expression or multiple expressions joined by the list operator (which Perl and JavaScript support). Each EXPR is typically generated out of a single schema clause. Some pseudo-example of generated JavaScript code:

(data >= 0)  # from clause: min => 0
&&
(data <= 10) # from clause: max => 10

Another example, a fuller translation of schema [int => {min=>0, max=>10}] to Perl, returning string result (error message) instead of boolean:

# from clause: req => 0
!defined($data) ? 1 : (

    # type check
    ($data =~ /^[+-]?\d+$/ ? 1 : ($err //= "Data is not an integer", 0))

    &&

    # from clause: min => 0
    ($data >=  0 ? 1 : ($err //= "Must be at least 0", 0))

    &&

    # from clause: max => 10
    ($data <= 10 ? 1 : ($err //= "Must be at most 10", 0))

)

The final validator code will add enclosing subroutine and variable declaration, loading of modules, etc.

Note: Current assumptions/hard-coded things for the supported languages: ternary operator (? :), semicolon as statement separator.

COMPILATION DATA KEYS

  • use_dpath => bool

    Convenience. This is set when code needs to track data path, which is when return_type argument is set to something other than bool or bool+val, and when schema has subschemas. Data path is used when generating error message string, to help point to the item in the data structure (an array element, a hash value) which fails the validation. This is not needed when we want the validator to only return true/false, and also not needed when we do not recurse into subschemas.

  • data_term => ARRAY

    Input data term. Set to $cd->{args}{data_term} or a temporary variable (if $cd->{args}{data_term_is_lvalue} is false). Hooks should use this instead of $cd->{args}{data_term} directly, because aside from the aforementioned temporary variable, data term can also change, for example if default.temp or prefilters.temp attribute is set, where generated code will operate on another temporary variable to avoid modifying the original data. Or when .input attribute is set, where generated code will operate on variable other than data.

  • subs => ARRAY

    Contains pairs of subroutine names and definition code string, e.g. [ [_sahs_zero => 'sub _sahs_zero { $_[0] == 0 }'], [_sahs_nonzero => 'sub _sah_s_nonzero { $_[0] != 0 }'] ]. For flexibility, you'll need to do this bit of arranging yourself to get the final usable code you can compile in your chosen programming language.

  • vars => HASH

  • coerce_to => str

    Retrieved from the schema's x.$COMPILER.coerce_to attribute. Each type handler might have its own default value.

INTERNAL VARIABLES IN THE GENERATED CODE

The generated code maintains the following variables. _sahv_ prefix stands for "Sah validator", it is used to minimize clash with data_term.

  • _sahv_dpath => ARRAY

    Analogous to spath in compilation data, this variable stands for "data path" and is used to track location within data. If a clause is checking each element of an array (like the 'each_elem' or 'elems' array clause), this variable will be adjusted accordingly. Error messages thus can be more informative by pointing more exactly where in the data the problem lies.

  • tmp_data_term => ANY

    As explained in the compile() method, this is used to store temporary value when checking against clauses.

  • _sahv_stack => ARRAY

    This variable is used to store validation result of subdata. It is only used if the validator is returning a string or full structure, not a single boolean value. See Data::Sah::Compiler::js::TH::hash for an example.

  • _sahv_x

    Usually used as temporary variable in short, anonymous functions.

ATTRIBUTES

These usually need not be set/changed by users.

hc => OBJ

Instance of Data::Sah::Compiler::human, to generate error messages.

comment_style => STR

Specify how comments are written in the target language. Either 'cpp' (// comment), 'shell' (# comment), 'c' (/* comment */), or 'ini' (; comment). Each programming language subclass will set this, for example, the perl compiler sets this to 'shell' while js sets this to 'cpp'.

var_sigil => STR

concat_op => STR

logical_and_op => STR

logical_not_op => STR

METHODS

new() => OBJ

$c->compile(%args) => RESULT

Generate a validator (function) for the given schema.

Aside from base class' arguments, this class supports these arguments (suffix * denotes required argument):

  • cache

    Bool, default false. If set to true, will generate validators for base schemas when possible, compile them into functions in the Data::Sah::_GeneratedValidators::*, then have the generated validator code calls these functions. This will result in smaller validator code and shorter compilation time especially for large/complex schema that is composed from subschemas. But this will also create a (usually insignificant) additional overhead of multiple function calls when doing validation using the generated validator code.

    Only relevant when "name" argument is set. When a certain named function is already defined, avoid generating the function declaration again and instead call the defined function.

  • data_term

    Str. A variable name or an expression in the target language that contains the data, defaults to var_sigil + name if not specified.

  • data_term_is_lvalue

    Bool, default true. Whether data_term can be assigned to.

  • tmp_data_name

    Str. Normally need not be set manually, as it will be set to "tmp_" . data_name. Used to store temporary data during clause evaluation.

  • tmp_data_term

    Str. Normally need not be set manually, as it will be set to var_sigil . tmp_data_name. Used to store temporary data during clause evaluation. For example, in JavaScript, the 'int' and 'float' type pass strings in the type check. But for further checking with the clauses (like 'min', 'max', 'divisible_by') the string data needs to be converted to number first. Likewise with prefiltering. This variable holds the temporary value. The clauses compare against this value. At the end of clauses, the original data_term is restored. So the output validator code for schema [int => min => 1] will look something like:

    // type check 'int'
    type(data)=='number' && Math.round(data)==data || parseInt(data)==data)
    
    &&
    
    // convert to number
    (tmp_data = type(data)=='number' ? data : parseFloat(data), true)
    
    &&
    
    // check clause 'min'
    (tmp_data >= 1)
  • err_term

    Str. A variable name or lvalue expression to store error message(s), defaults to var_sigil + err_NAME (e.g. $err_data in the Perl compiler).

  • var_prefix

    Str, default "_sahv_". Prefix for variables declared by generated code.

  • sub_prefix

    Str, default "_sahs_". Prefix for subroutines declared by generated code.

  • code_type

    Str, default "validator". The kind of code to generate. For now the only valid (and default) value is 'validator'. Compiler can perhaps generate other kinds of code in the future.

  • return_type

    Str, default "bool". Specify what kind of return value the generated code should produce. Either bool_valid, bool_valid+val, str_errmsg, str_errmsg+val, or hash_details.

    bool_valid means generated validator code should just return true/false depending on whether validation succeeds/fails.

    bool_valid+val is like bool_valid, but instead of just bool_valid the validator code will return a two-element arrayref [bool_valid, val] where val is the final value of data (after setting of default, coercion, etc.)

    str_errmsg means validation should return an error message string (the first one encountered) if validation fails and an empty string/undef if validation succeeds.

    str_errmsg+val is like str_errmsg, but instead of just str_errmsg the validator code will return a two-element arrayref [str_errmsg, val] where val is the final value of data (after setting of default, coercion, etc.)

    hash_details means validation should return a full hash data structure. From this structure you can check whether validation succeeds, retrieve all the collected errors/warnings, etc.

  • coerce

    Bool, default true. If set to false, will not include coercion code.

  • debug

    Bool, default false. This is a general debugging option which should turn on all debugging-related options, e.g. produce more comments in the generated code, etc. Each compiler might have more specific debugging options.

    If turned on, specific debugging options can be explicitly turned off afterwards, e.g. debug=>1, debug_log=>0 will turn on all debugging options but turn off the debug_log setting.

    Currently turning on debug means:

    - Turning on the other debug_* options, like debug_log
    - Prefixing error message with msgpath
  • debug_log

    Bool, default false. Whether to add logging to generated code. This aids in debugging generated code specially for more complex validation.

  • comment

    Bool, default true. If set to false, generated code will be devoid of comments.

  • human_hash_values

    Hash. Optional. Will be passed to hash_values argument during compile() by human compiler.

$c->comment($cd, @args) => STR

Generate a comment. For example, in perl compiler:

$c->comment($cd, "123"); # -> "# 123\n"

Will return an empty string if compile argument comment is set to false.

HOMEPAGE

Please visit the project's homepage at https://metacpan.org/release/Data-Sah.

SOURCE

Source repository is at https://github.com/perlancar/perl-Data-Sah.

AUTHOR

perlancar <perlancar@cpan.org>

CONTRIBUTING

To contribute, you can send patches by email/via RT, or send pull requests on GitHub.

Most of the time, you don't need to build the distribution yourself. You can simply modify the code, then test via:

% prove -l

If you want to build the distribution (e.g. to try to install it locally on your system), you can install Dist::Zilla, Dist::Zilla::PluginBundle::Author::PERLANCAR, Pod::Weaver::PluginBundle::Author::PERLANCAR, and sometimes one or two other Dist::Zilla- and/or Pod::Weaver plugins. Any additional steps required beyond that are considered a bug and can be reported to me.

COPYRIGHT AND LICENSE

This software is copyright (c) 2024, 2022, 2021, 2020, 2019, 2018, 2017, 2016, 2015, 2014, 2013, 2012 by perlancar <perlancar@cpan.org>.

This is free software; you can redistribute it and/or modify it under the same terms as the Perl 5 programming language system itself.

BUGS

Please report any bugs or feature requests on the bugtracker website https://rt.cpan.org/Public/Dist/Display.html?Name=Data-Sah

When submitting a bug or request, please include a test-file or a patch to an existing test-file that illustrates the bug or desired feature.