NAME

Locale::Unicode - Unicode Locale Identifier compliant with BCP47 and CLDR

SYNOPSIS

use Locale::Unicode;
my $locale = Locale::Unicode->new( 'ja-Kana-t-it' ) ||
    die( Locale::Unicode->error );
say $locale; # ja-Kana-t-it

# Some undefined locale in Cyrillic script
my $locale = Locale::Unicode->new( 'und-Cyrl' );
$locale->transform( 'und-latn' );
$locale->mechanism( 'ungegn-2007' );
say $locale; # und-Cyrl-t-und-latn-m0-ungegn-2007
# A locale in Cyrillic, transformed from Latin, according to a UNGEGN specification dated 2007.

# Enabling fatal exceptions
use v5.34;
use experimental 'try';
no warnings 'experimental';
try
{
    my $locale = Locale::Unicode->new( 'x', fatal => 1 );
    # More code
}
catch( $e )
{
    say "Oops: ", $e->message;
}

Or, you could set the global variable $FATAL_EXCEPTIONS instead:

use v5.34;
use experimental 'try';
no warnings 'experimental';
local $Locale::Unicode::FATAL_EXCEPTIONS = 1;
try
{
    my $locale = Locale::Unicode->new( 'x' );
    # More code
}
catch( $e )
{
    say "Oops: ", $e->message;
}

This API detects when methods are called in object context and return the current object:

$locale->translation( 'my-software' )->tz( 'jptyo' )->ca( 'japanese' )

In Scalar or in list context, the value returned is the last value set.

$locale->translation( 'my-software' ); # my-software
$locale->translation( 'other-software' ); # other-software

VERSION

v0.3.11

DESCRIPTION

This module implements the Unicode LDML (Locale Data Markup Language) extensions

It does not enforce the standard, and is merely an API to construct, access and modify locales. It is your responsibility to set the right values.

The only requirement is to provide a proper language, which is a 2 or 3-characters code, or a privateuse or other grandfathered language tags

For your convenience, summary of key elements of the standard can be found in this documentation.

It is lightweight and fast with no dependency outside of Scalar::Util and Want. It requires perl v5.10.1 minimum to operate.

The object stringifies, and once its string value is computed, it is cached and re-used until it is changed. Thus repetitive call to as_string or to stringification does not incur any speed penalty by recomputing what has not changed.

See the LDML specifications fore more information of what composes a Unicode language identifier.

CONSTRUCTOR

new

# Sets the language 'en'
my $locale = Locale::Unicode->new( 'en' );
# Sets the language 'en' with territory 'GB'
my $locale = Locale::Unicode->new( 'en-GB' );
# Sets the language 'en' with script 'Latn' and territory 'AU'
my $locale = Locale::Unicode->new( 'en-Latn-AU' );
my $locale = Locale::Unicode->new( 'he-IL-u-ca-hebrew-tz-jeruslm' );
my $locale = Locale::Unicode->new( 'ja-Kana-t-it' );
my $locale = Locale::Unicode->new( 'und-Latn-t-und-cyrl' );
my $locale = Locale::Unicode->new( 'und-Cyrl-t-und-latn-m0-ungegn-2007' );
my $locale = Locale::Unicode->new( 'de-u-co-phonebk-ka-shifted' );
# Machine translated from German to Japanese using an undefined vendor
my $locale = Locale::Unicode->new( 'ja-t-de-t0-und' );
$locale->script( 'Kana' );
$locale->country_code( 'JP' );
# Now: ja-Kana-JP-t-de-t0-und

This takes a locale as compliant with the BCP47 standard upgraded by the LDML specifications, and an optional hash or hash reference of options and this returns a new object.

The locale provided is parsed and its components can be accessed and modified using all the methods of this class API.

If an hash or hash reference of options are provided, it will be used to set or modify the components from the locale provided.

If an error occurs, an exception object is set and undef is returned in scalar context, or an empty list in list context. The exception object can then be retrieved using error, such as:

my $locale = Locale::Unicode->new( $somthing_bad ) ||
    die( Locale::Unicode->error );

METHODS

All the methods below are context sensitive.

If they are called in an object context, they will return the current Locale::Unicode object for chaining, otherwise, they will return the current value. And if that value is undef, it will return undef in scalar context, but an empty list in list context.

Also, if an error occurs, it will set an exception object and returns undef in scalar context, or an empty list in list context.

apply

my $hash_reference = Locale::Unicode->parse( 'ja-Kana-t-it' );
$locale->apply( $hash_reference );

Provided with an hash reference of key-value pairs, and this will set each corresponding method with the associated value.

If a property provided has no corresponding method, it emits a warning if warnings are enabled

It returns the current object upon success, or sets an error object upon error and returns undef in scalar context, or an empty list in list context.

as_string

Returns the Locale object as a string, based on its latest attributes set.

The string value returned is computed only once and further call to as_string returns a cached value unless changes were made to the Locale attributes.

Boolean values are expressed as true for tue values and false for false values. However, if a value is true for a given locale component, it is not explicitly stated by default, since the LDML specifications indicate, it is true implicitly. If, however, you want the true boolean value to be displayed nevertheless, make sure to set the global variable $EXPLICIT_BOOLEAN to a true value.

For example:

my $locale = Locale::Unicode->new( 'ko-Kore-KR', {
    # You can also use 1 or 'yes' as per the specifications
    colNumeric => 'true',
    colCaseFirst => 'upper'
});
say $locale; # ko-Kore-KR-u-kf-upper-kn

local $EXPLICIT_BOOLEAN = 1;
my $locale = Locale::Unicode->new( 'ko-Kore-KR', {
    # You can also use 1 or 'yes' as per the specifications
    colNumeric => 'true',
    colCaseFirst => 'upper'
});
say $locale; # ko-Kore-KR-u-kf-upper-kn-true

base

my $locale = Locale::Unicode->new( 'en-US' );
say $locale->base; # en-US

my $locale = Locale::Unicode->new( 'en-Latn-US-posix-t-de-AT-t0-und-x0-medical' );
say $locale->base; # en-Latn-US-posix
$locale->base( 'ja-JP' );
say $locale->base; # ja-JP
say $locale; ja-JP-t-de-AT-t0-und-x0-medical

This method sets or gets the base part of the locale

The base part is composed of the language_id, an optional script, an optional territory and zero or more variants

If a value is provided, it will replace the current locale object base

If an improper base value is provided, it will set an error object and return undef in scalar context and an empty list in list context.

It returns the current base as a string.

break_exclusion

my $locale = Locale::Unicode->new( 'ja' );
$locale->break_exclusion( 'hani-hira-kata' );
# Now: ja-dx-hani-hira-kata

This is a Unicode Dictionary Break Exclusion Identifier that specifies scripts to be excluded from dictionary-based text break (for words and lines).

Sets or gets the Unicode extension dx

ca

This is an alias for "calendar"

calendar

my $locale = Locale::Unicode->new( 'th' );
$locale->calendar( 'buddhist' );
# or:
# $locale->ca( 'buddhist' );
# Now: th-u-ca-buddhist
# which is the Thai with Buddist calendar

Sets or gets the Unicode extension ca, which is a calendar identifier.

See the section on "BCP47 EXTENSIONS" for the proper values.

canonical

This returns a clone of the current object, formatted as per the Unicode locale canonical specifications.

This means that:

variant

Variants are sorted and made in lower case.

my $locale = Locale::Unicode->new( 'en-Scouse-fonipA' );
say $locale->canonical; # en-fonipa-scouse

Any duplicates are removed as per the LDML specifications.

my $locale = Locale::Unicode->new( 'de-1996-fonipa-1996' );
say $locale->canonical; # de-1996-fonipa

territory

Territory is made in upper case

my $locale = Locale::Unicode->new( 'en-us' );
say $locale->canonical; # en-US

# Spanish as spoken in South America
my $locale = Locale::Unicode->new( 'es-005' );
say $locale->canonical; # es-005

script

Script is formatted in title case.

my $locale = Locale::Unicode->new( 'ja-kana-jp' );
say $locale->canonical; # ja-Kana-JP

language

The language code is made in lower case.

The special language code root is replaced by und

See the LDML specifications for more information.

See also the method "normalise" in Locale::Unicode::Data

cf

This is an alias for "cu_format"

clone

Clones the current object and returns the newly instantiated copy.

If an error occurs, this sets an exception object and returns undef in scalar context, and an empty list in list context.

co

my $locale = Locale::Unicode->new( 'de' );
$locale->collation( 'phonebk' );
$locale->ka( 'shifted' );
# Now: de-u-co-phonebk-ka-shifted

This is a Unicode collation identifier that specifies a type of collation (sort order).

This is an alias for "collation"

colAlternate

my $locale = Locale::Unicode->new( 'de' );
$locale->collation( 'phonebk' );
$locale->ka( 'shifted' );
# Now: de-u-co-phonebk-ka-shifted

$locale->collation( 'noignore' );
# or similarly:
$locale->collation( 'non-ignorable' );

Sets alternate handling for variable weights.

Sets or gets the Unicode extension ka

See "Collation Options" for more information.

colBackwards

$locale->colBackwards(1); # true
# Now: kb-true
$locale->colBackwards(0); # false
# Now: kb-false

Sets collation boolean value for backward collation weight.

Sets or gets the Unicode extension kb

See "Collation Options" for more information.

colCaseFirst

$locale->colCaseFirst( undef ); # false (default)
$locale->colCaseFirst( 'upper' );
$locale->colCaseFirst( 'lower' );

Sets or gets the Unicode extension kf

See "Collation Options" for more information.

colCaseLevel

$locale->colCaseLevel(1); # true
# Now: kc-true
$locale->colCaseLevel(0); # false
# Now: kc-false

Sets collation boolean value for case level.

Sets or gets the Unicode extension kc

See "Collation Options" for more information.

colHiraganaQuaternary

$locale->colHiraganaQuaternary(1); # true
# Now: kh-true
$locale->colHiraganaQuaternary(0); # false
# Now: kh-false

Sets collation parameter key for special Hiragana handling.

Sets or gets the Unicode extension kh

See "Collation Options" for more information.

collation

my $locale = Locale::Unicode->new( 'fr' );
$locale->collation( 'emoji' );
# Now: fr-u-co-emoji

my $locale = Locale::Unicode->new( 'de' );
$locale->collation( 'phonebk' );
# Now: de-u-co-phonebk
# which is: German using Phonebook sorting

Sets or gets the Unicode extension co

This specifies a type of collation (sort order).

See "Unicode extensions" for possible values and more information on standard.

See also "Collation Options" for more on collation options.

colNormalisation

This is an alias for colNormalization

colNormalization

$locale->colNormalization(1); # true
# Now: kk-true
$locale->colNormalization(0); # false
# Now: kk-false

Sets collation parameter key for normalisation.

Sets or gets the Unicode extension kk

See "Collation Options" for more information.

colNumeric

$locale->colNumeric(1); # true
# Now: kn-true
$locale->colNumeric(0); # false
# Now: kn-false

Sets collation parameter key for numeric handling.

Sets or gets the Unicode extension kn

See "Collation Options" for more information.

colReorder

my $locale = Locale::Unicode->new( 'en' );
$locale->colReorder( 'latn-digit' );
# Now: en-u-kr-latn-digit
# Reorder digits after Latin characters.

my $locale = Locale::Unicode->new( 'en' );
$locale->colReorder( 'arab-cyrl-others-symbol' );
# Now: en-u-kr-arab-cyrl-others-symbol
# Reorder Arabic characters first, then Cyrillic, and put
# symbols at the end—after all other characters.

Sets collation reorder codes.

Sets or gets the Unicode extension kr

See "Collation Options" for more information.

shiftedGroup

This is an alias for "colValue"

colStrength

$locale->colStrength( 'level1' );
# Now: ks-level1
# or, equivalent:
$locale->colStrength( 'primary' );

$locale->colStrength( 'level2' );
# or, equivalent:
$locale->colStrength( 'secondary' );

$locale->colStrength( 'level3' );
# or, equivalent:
$locale->colStrength( 'tertiary' );

$locale->colStrength( 'level4' );
# or, equivalent:
$locale->colStrength( 'quaternary' );
$locale->colStrength( 'quarternary' );

$locale->colStrength( 'identic' );
$locale->colStrength( 'identic' );
$locale->colStrength( 'identical' );

Sets the collation parameter key for collation strength used for comparison.

Sets or gets the Unicode extension ks

See "Collation Options" for more information.

colValue

$locale->colValue( 'currency' );
$locale->colValue( 'punct' );
$locale->colValue( 'space' );
$locale->colValue( 'symbol' );

Sets the collation value for the last reordering group to be affected by ka-shifted.

Sets or gets the Unicode extension kv

See "Collation Options" for more information.

colVariableTop

Sets the string value for the variable top.

Sets or gets the Unicode extension vt

See "Collation Options" for more information.

core

my $locale = Locale::Unicode->new( 'ja-Kana-JP-t-de-AT-t0-und-u-ca-japanese-tz-jptyo' );
say $locale->core; # ja-Kana-JP
my $locale = Locale::Unicode->new( 'es-001-valencia-t-und-latn-m0-ungegn-2007' );
say $locale->core; # es-001-valencia

This is a read-only method.

It returns the core part of the locale, which is composed of a 2 to 3-characters code, some optional script and country or region code, and some option variant ID.

country_code

my $locale = Locale::Unicode->new( 'en' );
$locale->country_code( 'US' );
# Now: en-US
$locale->country_code( 'GB' );
# Now: en-GB

Sets or gets the country code part of the locale.

A country code should be an ISO 3166 2-letters code, but keep in mind that the LDML (Locale Data Markup Language) accepts old data to ensure stability.

Note that when you set a country code, it will automatically unset any region code.

my $locale = Locale::Unicode->new( 'en-001' );
say $locale->region; # 001
$locale->country_code( 'US' );
say $locale->region; # undef
say $locale; # en-US

You can use "territory" alternatively.

cu

my $locale = Locale::Unicode->new( 'ja' );
$locale->cu( 'jpy' );
# Now: ja-u-cu-jpy
# which is the Japanese Yens

This is a Unicode currency identifier that specifies a type of currency (ISO 4217 code.

This is an alias for "currency"

cu_format

# Using minus sign symbol for negative numbers
$locale->cf( 'standard' );
# Using parentheses for negative numbers
$locale->cf( 'account' );

This is a currency format identifier such as standard or account

Sets or gets the Unicode extension cf

See the section on "BCP47 EXTENSIONS" for the proper values.

currency

my $locale = Locale::Unicode->new( 'ja' );
$locale->currency( 'jpy' );
# or
# $locale->cu( 'jpy' );
# Now: ja-u-cu-jpy
# which is the Japanese yens

Sets or gets the Unicode extension cu

This specifies a type of ISO4217 currency code.

d0

This is an alias for "destination"

dest

This is an alias for "destination"

destination

Sets or gets the Transformation extension d0 for destination.

See the section on "Transform extensions" for more information.

dx

This is an alias for "break_exclusion"

em

This is an alias for "emoji"

emoji

This is a Unicode Emoji Presentation Style Identifier that specifies a request for the preferred emoji presentation style.

Sets or gets the Unicode extension em.

error

Used as a mutator, this sets and exception object and returns an Locale::Unicode::NullObject in object context (such as when chaining), or undef in scalar context, or an empty list in list context.

The Locale::Unicode::NullObject class prevents the perl error of Can't call method "%s" on an undefined value (see perldiag). Upon the last method chained, undef is returned in scalar context or an empty list in list context.

For example:

my $locale = Locale::Unicode->new( 'ja' );
$locale->translation( 'my-software' )->transform_locale( $bad_value )->tz( 'jptyo' ) ||
    die( $locale->error );

In this example, jptyo will never be set, because transform_locale triggered an exception that returned an Locale::Unicode::NullObject object catching all further method calls, but eventually we get the error and die.

extended

# Chinese, Mandarin, Simplified script, as used in China
my $locale = Locale::Unicode->new( 'zh-cmn-Hans-CN' );
say $locale->extended; # cmn

# Mandarin Chinese, Simplified script, as used in China
my $locale = Locale::Unicode->new( 'cmn-Hans-CN' );
say $locale->extended; # undef
say $locale->script; # Hans

# Chinese, Cantonese, as used in Hong Kong SAR
my $locale = Locale::Unicode->new( 'zh-yue-HK' );
say $locale->extended; # yue

Sets or gets the extended language subtags. As per the standard, a language ID may be followed by up to 3 extended language subtag. However, the standard states: "Although the ABNF production 'extlang' permits up to three extended language tags in the language tag, extended language subtags MUST NOT include another extended language subtag in their 'Prefix'. That is, the second and third extended language subtag positions in a language tag are permanently reserved and tags that include those subtags in that position are, and will always remain, invalid."

The regular expression in Locale::Unicode supports the extended language subtag inherited by Unicode from BCP47, although it is not strictly supported by the standard. This is done in order to ensure maximum portability and flexibility.

false

This is read-only and returns a Locale::Unicode::Boolean object representing a false value.

fatal

$locale->fatal(1); # Enable fatal exceptions
$locale->fatal(0); # Disable fatal exceptions
my $bool = $locale->fatal;

Sets or get the boolean value, whether to die upon exception, or not. If set to true, then instead of setting an exception object, this module will die with an exception object. You can catch the exception object then after using try. For example:

use v.5.34; # to be able to use try-catch blocks in perl
use experimental 'try';
no warnings 'experimental';
try
{
    my $locale = Locale::Unicode->new( 'x', fatal => 1 );
}
catch( $e )
{
    say "Error occurred: ", $e->message;
    # Error occurred: Invalid locale value "x" provided.
}

first_day

This is a Unicode First Day Identifier that specifies the preferred first day of the week for calendar display.

Sets or gets the Unicode extension fw.

Its values are sun, mon, etc... sat

fw

This is an alias for "first_day"

grandfathered

# auto-detect and sets an irregular grandfathered language tag
$locale->grandfathered( 'i-klingon' );
# sets a regular grandfathered language tag
$locale->grandfathered( 'zh-hakka' );

Sets or gets a regular or irregular grandfathered language tags

Those language tags are old-style language tags, that, although they remain valid for most of them, their format has morphed, and most of them have been superseded.

This is a convenient method that takes a language tag, and based on its value, this will call the method regular or irregular

If you set a grandfathered language tag, this will automatically unset the language, language3 or privateuse tag value.

The regular expression in Locale::Unicode supports the grandfathered language subtag inherited by Unicode from BCP47, although it is not strictly supported by the standard. This is done in order to ensure maximum portability and flexibility.

grandfathered_irregular

$locale->grandfathered_irregular( 'en-GB-oed' );
$locale->grandfathered_irregular( 'i-ami' );
$locale->grandfathered_irregular( 'i-bnn' );
$locale->grandfathered_irregular( 'i-default' );
$locale->grandfathered_irregular( 'i-enochian' );
$locale->grandfathered_irregular( 'i-hak' );
$locale->grandfathered_irregular( 'i-klingon' );
$locale->grandfathered_irregular( 'i-lux' );
$locale->grandfathered_irregular( 'i-mingo' );
$locale->grandfathered_irregular( 'i-navajo' );
$locale->grandfathered_irregular( 'i-pwn' );
$locale->grandfathered_irregular( 'i-tao' );
$locale->grandfathered_irregular( 'i-tay' );
$locale->grandfathered_irregular( 'i-tsu' );
$locale->grandfathered_irregular( 'sgn-BE-FR' );
$locale->grandfathered_irregular( 'sgn-BE-NL' );
$locale->grandfathered_irregular( 'sgn-CH-DE' );

Sets or gets an irregular grandfathered language tag.

Setting a value, including undef, will unset the language, language3, privateuse or grandfathered_regular tag value.

grandfathered_regular

$locale->grandfathered_regular( 'art-lojban' );
$locale->grandfathered_regular( 'cel-gaulish' );
$locale->grandfathered_regular( 'no-bok' );
$locale->grandfathered_regular( 'no-nyn' );
$locale->grandfathered_regular( 'zh-guoyu' );
$locale->grandfathered_regular( 'zh-hakka' );
$locale->grandfathered_regular( 'zh-min' );
$locale->grandfathered_regular( 'zh-min-nan' );
$locale->grandfathered_regular( 'zh-xiang' );

Sets or gets a regular grandfathered language tag.

Setting a value, including undef, will unset the language, language3, privateuse or grandfathered_irregular tag value.

h0

This is an alias for "hybrid"

hc

This is an alias for "hour_cycle"

hour_cycle

This is a Unicode Hour Cycle Identifier that specifies the preferred time cycle.

Sets or gets the Unicode extension hc.

hybrid

my $locale = Locale::Unicode->new( 'ru' );
$locale->transform( 'en' );
$locale->hybrid(1); # true
# or
# $locale->hybrid( 'hybrid' );
# or
# $locale->h0( 'hybrid' );
# Now: ru-t-en-h0-hybrid
# Hybrid Cyrillic - Runglish

my $locale = Locale::Unicode->new( 'en' );
$locale->transform( 'zh-hant' );
$locale->hybrid( 'hybrid' );
# Now: en-t-zh-hant-h0-hybrid
# which is Hybrid Latin - Chinglish

Those are Hybrid Locale Identifiers indicating that the t value is a language that is mixed into the main language tag to form a hybrid.

Sets or gets the Transformation extension h0.

See the section on "Transform extensions" for more information.

i0

This is an alias for "input"

k0

This is an alias for "keyboard"

input

my $locale = Locale::Unicode->new( 'zh' );
$locale->input( 'pinyin' );
# Now: zh-t-i0-pinyin

This is an Input Method Engine transformation.

Sets or gets the Transformation extension i0.

See the section on "Transform extensions" for more information.

ka

This is an alias for "colAlternate"

kb

This is an alias for "colBackwards"

kc

This is an alias for "colCaseLevel"

keyboard

my $locale = Locale::Unicode->new( 'en' );
$locale->keyboard( 'dvorak' );
# Now: en-t-k0-dvorak

This is a keyboard transformation, such as used by client-side virtual keyboards.

Sets or gets the Transformation extension k0.

See the section on "Transform extensions" for more information.

kf

This is an alias for "colCaseFirst"

kh

This is an alias for "colHiraganaQuaternary"

kk

This is an alias for "colNormalization"

kn

This is an alias for "colNumeric"

kr

This is an alias for "colReorder"

ks

This is an alias for "colStrength"

kv

This is an alias for "colValue"

lang

# current value: fr-FR
$obj->lang( 'de' );
# Now: de-FR

Sets or gets the language part of this locale object.

Note that when you set a 2-letters language code, it automatically will unset any 3-characters language code you would have previously set.

For example:

$obj->lang( 'ja' );
# locale is now set with language code 'ja'
$obj->lang3( 'jpn' );
# locale is now set with 3-characters language code 'jpn'
say $obj->lang; # undef

lang3

my $locale = Locale::Unicode->new( 'ja' );
say $locale; # ja
$locale->language3( 'jpn' );
say $locale->language; # undef
$locale->script( 'Kana' );
# Now: jpn-Kana

Sets or gets the 3-letter ISO 639-2 code. Keep in mind, however, that to ensure stability, the LDML (Locale Data Markup Language) also uses old data.

If you set the 3-characters language code, it will replace any previously set 2-characters language code.

language

This is an alias for lang

language3

This is an alias for lang3

language_extended

my $locale = Locale::Unicode->new( 'zh-cmn-TW' );
say $locale->language; # zh
say $locale->language3; # undef
say $locale->language_id; # zh
say $locale->extended; # cmn
say $locale->language_extended; # zh-cmn
say $locale->country_code; # TW

my $locale = Locale::Unicode->new( 'ja-JP' );
say $locale->language; # ja
say $locale->extended; # undef
say $locale->language_extended; # ja

# Okinawan spoken in Japan Southern islands
my $locale = Locale::Unicode->new( 'ryu-JP' );
say $locale->language; # undef
say $locale->language3; # ryu
say $locale->language_id; # ryu
say $locale->language_extended; # ryu

Read-only. This method returns the extended form of the language subtag, which means the 2 to 3-characters language ID and an optional extended language subtag.

Extended language subtag serves to provide more granularity to a locale, complementing the primary language subtag.

For example:

zh-cmn-Hans-CN (Chinese, Mandarin, Simplified script, as used in China)
zh-yue-HK (Chinese, Cantonese, as used in Hong Kong SAR)

However, with Unicode LDML, this is deprecated, and, for example, zh-cmn-TW would be normalised to just zh-TW. See "normalise" in Locale::Unicode::Data for more information.

language_id

$locale->language_id( 'ja' );
$locale->language_id( 'ryu' );
$locale->language_id( 'und' );
# Unset the language ID
$locale->language_id( undef );
my $str = $locale->language_id;

Sets or gets a language ID.

In mutator mode, if the language ID provided is 3-characters long, then language3 will be called to set it, otherwise language will be called.

In accessor mode, it returns the language ID whether it is a 2-characters ID accessible via language, or a 3-characters ID accessible via language3

lb

This is an alias for "line_break"

line_break

This is a Unicode Line Break Style Identifier that specifies a preferred line break style corresponding to the CSS level 3 line-break option.

Sets or gets the Unicode extension lb.

line_break_word

This is a Unicode Line Break Word Identifier that specifies a preferred line break word handling behavior corresponding to the CSS level 3 word-break option

Sets or gets the Unicode extension lw.

locale

This is an alias for lang

locale3

This is an alias for lang3

lw

This is an alias for "line_break_word"

m0

This is an alias for "mechanism"

measurement

This is a Unicode Measurement System Identifier that specifies a preferred measurement system.

Sets or gets the Unicode extension ms.

mechanism

my $locale = Locale::Unicode->new( 'und-Latn' );
$locale->transform( 'ru' );
$locale->mechanism( 'ungegn-2007' );
# Now: und-Latn-t-ru-m0-ungegn-2007
# representing a transformation from United Nations Group of Experts on 
# Geographical Names in 2007

This is a transformation mechanism referencing an authority or rules for a type of transformation.

Sets or gets the Transformation extension m0.

See the section on "Transform extensions" for more information.

merge

my $locale1 = Locale::Unicode->new( 'ja-JP' );
my $locale2 = Locale::Unicode->new( 'ja-Kana-hepburn-heploc' );
say $locale1->merge( $locale2 ); # ja-Kana-JP-hepburn-heploc

Provided with another Locale::Unicode object, or a locale string, and this will merge all of that object property with the current object used to call this method.

Since a locale can have multiple variants, merging two locale object, will merge the variants, while avoiding duplicates, like so:

my $locale1 = Locale::Unicode->new( 'ja-Kana-posix-hepburn' );
my $locale2 = Locale::Unicode->new( 'ja-JP-hepburn-heploc' );
say $locale1->merge( $locale2 ); # ja-Kana-JP-posix-hepburn-heploc

Note that it will not sort the variants. For that you want to use the canonical method.

See also the method "normalise" in Locale::Unicode::Data

It returns the current object.

ms

This is an alias for "measurement"

mu

This is an alias for "unit"

nu

This is an alias for "number"

number

This is a Unicode Number System Identifier that specifies a type of number system.

Sets or gets the Unicode extension nu.

overlong

my $locale = Locale::Unicode->new( 'en-US' );
say $locale->overlong; # undef
say $locale->country_code; # US
say $locale->territory; # US
# Changing to overlong USA
$locale->overlong( 'USA' );
say $locale->overlong; # USA
say $locale->country_code; # undef
say $locale->territory; # undef

But doing the following will not yield what you expect, because the overlong territory would be confused by an extended language subtag.

# Italian at Vatican City
my $locale = Locale::Unicode->new( 'it-VAT' );
say $locale->overlong; # undef
say $locale->extended; # VAT

# Spanish as spoken at Panama
my $locale = Locale::Unicode->new( 'es-PAN-valencia' );
say $locale->overlong; # undef
say $locale->extended; # PAN

Thus, you cannot expect to have the value for overlong set. However, you can set it yourself directly by passing a value to method overlong

Sets or gets an overlong country code.

You can normalise those overlong country code to their normal equivalent by using "normalise" in Locale::Unicode::Data

private

my $locale = Locale::Unicode->new( 'ja-JP' );
$locale->private( 'something-else' );
# Now: ja-JP-x-something-else

This serves to set or get the value for a private subtag.

privateuse

$locale->privateuse( 'x-abc' );
my $str = $locale->privateuse;

Sets or gets the privateuse language tag.

Note that this use is deprecated. See the LDML specifications

The regular expression in Locale::Unicode supports the privateuse language subtag inherited by Unicode from BCP47, although it is not strictly supported by the standard. This is done in order to ensure maximum portability and flexibility.

region

# current value: fr-FR
$locale->region( '150' );
# Now: fr-150

Sets or gets the region part of a Unicode locale.

This is a world region represented by a 3-digits code.

Note that when you set a region code, it will automatically unset any country code code.

my $locale = Locale::Unicode->new( 'en-US' );
say $locale->country_code; # US
$locale->region( '001' );
say $locale->country_code; # undef
say $locale; # en-001

Also, be careful that since the region code a padded with leading zeroes, not to turn them inadvertently into integer so that 001 would not become 1. This is particularly true if you store it in a SQL database, where the DBI driver might treat it as a number. You would then have to use bind_param

Below are the known region codes:

001

World
002

Africa
003

North America
005

South America
009

Oceania
011

Western Africa
013

Central America
014

Eastern Africa
015

Northern Africa
017

Middle Africa
018

Southern Africa
019

Americas
021

Northern America
029

Caribbean
030

Eastern Asia
034

Southern Asia
035

Southeast Asia
039

Southern Europe
053

Australasia
054

Melanesia
057

Micronesian Region
061

Polynesia
142

Asia
143

Central Asia
145

Western Asia
150

Europe
151

Eastern Europe
154

Northern Europe
155

Western Europe
202

Sub-Saharan Africa
419

Latin America

region_override

my $locale = Locale::Unicode->new( 'en-GB' );
$locale->region_override( 'uszzzz' );
# Now: en-GB-u-rg-uszzzz
# which is a locale for British English but with region-specific defaults set to US.

This is a Unicode Region Override that specifies an alternate country code or region to use for obtaining certain region-specific default values.

Sets or gets the Unicode extension rg.

reset

When provided with any argument, this will reset the cached value computed by "as_string"

rg

This is an alias for "region_override"

s0

This is an alias for "source"

script

# current value: zh-Hans
$locale->script( 'Hant' );
# Now: zh-Hant

Sets or gets the script part of the locale identifier.

sd

This is an alias for "subdivision"

sentence_break

This is a Unicode Sentence Break Suppressions Identifier that specifies a set of data to be used for suppressing certain sentence breaks.

Sets or gets the Unicode extension ss.

source

This is a transformation source for non-languages or scripts, such as fullwidth-halfwidth conversion.

Sets or gets the Transformation extension s0.

See the section on "Transform extensions" for more information.

ss

This is an alias for "sentence_break"

subdivision

my $locale = Locale::Unicode->new( 'gsw' );
$locale->subdivision( 'chzh' );
# or
# $locale->sd( 'chzh' );
# Now: gsw-u-sd-chzh

my $locale = Locale::Unicode->new( 'en-US' );
$locale->sd( 'usca' );
# Now: en-US-u-sd-usca

This is a Unicode Subdivision Identifier that specifies a regional subdivision used for locale. This is typically the States in the U.S., or prefectures in France or Japan, or provinces in Canada.

Sets or gets the Unicode extension sd.

Be careful of the rule in the standard. For example, en-CA-u-sd-gbsct would be invalid because gb in gbsct does not match the region subtag CA

t0

This is an alias for "translation"

t_private

my $locale = Locale::Unicode->new( 'ja' );
$locale->transform( 'und' );
$locale->t_private( 'medical' );
# Now: ja-t-de-t0-und-x0-medical

This is a private transformation subtag.

Sets or gets the Transformation private subtag x0.

territory

my $locale = Locale::Unicode->new( 'en' );
# Sets the country code to 'US'
$locale->territory( 'US' );
# Now: en-US
$locale->territory( 'GB' );
# Now: en-GB
# Sets the region to 150
$locale->territory( 150 );

Sets or gets the country code or the region code part of the locale.

A country code should be an ISO 3166 2-letters code, but keep in mind that the LDML (Locale Data Markup Language) accepts old data to ensure stability.

A world region is represented by a 3-digits code.

In mutator mode, depending on the value, this method territory will set one or the other.

In accessor mode, this will return the country code, if any, or the region code.

time_zone

This is a Unicode Timezone Identifier that specifies a time zone.

Sets or gets the Unicode extension tz.

timezone

This is an alias for "time_zone"

transform

my $locale = Locale::Unicode->new( 'ja' );
$locale->transform( 'it' );
# Now: ja-t-it
# which is Japanese, transformed from Italian

my $locale = Locale::Unicode->new( 'ja-Kana' );
$locale->transform( 'it' );
# Now: ja-Kana-t-it
# which is Japanese Katakana, transformed from Italian

# 'und' is undefined and is perfectly valid
my $locale = Locale::Unicode->new( 'und-Latn' );
$locale->transform( 'und-cyrl' );
# Now: und-Latn-t-und-cyrl
# which is Latin script, transformed from the Cyrillic script

Sets or gets the Transformation extension t.

This takes either a string representing a locale or an Locale::Unicode object.

If a string is provided, it will be converted to an Locale::Unicode object.

The resulting value is passed to transform_locale

This method is convenient since you do not have to concern yourself whether the value you provide is an object, or not.

It returns the current object for chaining.

transform_locale

my $locale = Locale::Unicode->new( 'ja' );
my $locale2 = Locale::Unicode->new( 'it' );
$locale->transform_locale( $locale2 );
# Now: ja-t-it
my $object = $locale->transform_locale;

Sets or gets a Locale::Unicode object used to indicate the original locale subject to transformation.

This will trigger an exception if a value, other than Locale::Unicode or an inheriting class object, is set.

See the section on "Transform extensions" for more information.

translation

my $locale = Locale::Unicode->new( 'ja' );
$locale->transform( 'de' );
$locale->translation( 'und' );
# Now: ja-t-de-t0-und
# Japanese translated from Germany by an undefined vendor

This is used to indicate content that has been machine translated, or a request for a particular type of machine translation of content.

Sets or gets the Transformation extension t0.

See the section on "Transform extensions" for more information.

true

This is read-only and returns a Locale::Unicode::Boolean object representing a true value.

tz

This is an alias for "time_zone"

unit

This is a Measurement Unit Preference Override that specifies an override for measurement unit preference.

Sets or gets the Unicode extension mu.

va

This is an alias for "variant"

variant

This is a Unicode Variant Identifier that specifies a special variant used for locales.

Sets or gets the Unicode extension va.

variants

This returns the variant part of the locale as an array reference of variant subtags.

It will always return an array reference whether any variant is set or not.

my $locale = Locale::Unicode->new( 'en-fonipa-scouse' );
my $ref = $locale->variants; # ['fonipa', 'scouse']

You could reliably do something like:

if( scalar( @{$locale->variants} ) > 1 )
{
    # Do something
}

Note that the proper canonical format of a locale has the variants sorted in alphabetical order.

vt

This is an alias for "colVariableTop"

x0

This is an alias for "t_private"

CLASS FUNCTIONS

matches

Provided with a BCP47 locale, and this returns an hash reference of its components if it matches the BCP47 regular expression, which can be accessed as global class variable $LOCALE_RE.

If nothing matches, it returns an empty string in scalar context, or an empty list in list context.