NAME

Unicode::Precis::Preparation - RFC 8264 PRECIS Framework - Preparation

SYNOPSIS

use Unicode::Precis::Preparation qw(prepare IdentifierClass);
$result = prepare($string, IdentifierClass);
%result = prepare($string, IdentifierClass);

DESCRIPTION

Unicode::Precis::Preparation prepares Unicode string or UTF-8 bytestring according to PRECIS framework.

Note that the word "UTF-8" in this document is used in its proper meaning.

Function

prepare ( $string, [ $stringclass ], [ UnicodeVersion => $version ] )

Check if a string conforms to specified string class.

Parameters:

$string

A string to be checked, Unicode string or bytestring.

Note that bytestring won't be upgraded to Unicode string but will be treated as UTF-8 sequence.

$stringclass

One of the constants ValidUTF8 (default), IdentifierClass (see RFC 8264) or FreeFormClass (ditto).

UnicodeVersion => $version

If a version of Unicode is given, repertoire is restricted according to it. By default, repertoire of Unicode version supported by Perl using this module is available.

Returns:

In scalar context: True value if the string conforms to specified string class. Otherwise false value.

In array context: A list of pairs describing detail of result with these keys:

result

One of property values described in "Constants".

offset

If the check fails, offset from beginning of string. If succeeds, length of string.

Offset or length is based on byte for bytestring, and based on character for Unicode string.

length

When the check fails, length of disallowed character. Length is 1 to 4 for bytestring, always 1 for Unicode string and undefined for invalid sequence.

ord

Unicode scalar value of character, when length item is set.

Constants

FreeFormClass
IdentifierClass
ValidUTF8

String classes. ValidUTF8 is the extension by this module.

UNASSIGNED
PVALID
CONTEXTJ
CONTEXTO
DISALLOWED

Property values to represent results. PVALID means successful result.

Exports

None are exported by default. prepare() and constants may be exported by :all tag.

RESTRICTIONS

prepare() can not check Unicode string on EBCDIC platforms.

Unicode versions

String classes

Derived properties are based on Unicode 6.3.0 or later. Some characters have imcompatible property values with Unicode prior to 6.0.0 (See also RFC 6452). Property values of characters added by Unicode version after 6.3.0 can be changed in the future.

Contextual rules

Character properties checked by contextual rules are based on Unicode version that recent version of Perl supports. Some characters have imcompatible property values with Unicode 6.3.0.

SEE ALSO

Unicode::Precis.

RFC 8264 PRECIS Framework: Preparation, Enforcement, and Comparison of Internationalized Strings in Application Protocols. https://tools.ietf.org/html/rfc8264.

AUTHOR

Hatuka*nezumi - IKEDA Soji, <hatuka@nezumi.nu>

COPYRIGHT AND LICENSE

Copyright (C) 2015, 2018 by Hatuka*nezumi - IKEDA Soji

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself. For more details, see the full text of the licenses at <http://dev.perl.org/licenses/>.

This program is distributed in the hope that it will be useful, but without any warranty; without even the implied warranty of merchantability or fitness for a particular purpose.