NAME

Unicode::Util - Unicode-aware versions of built-in Perl functions

VERSION

This document describes Unicode::Util version 0.06.

SYNOPSIS

use Unicode::Util qw( graph_length code_length byte_length );

# grapheme cluster ю́: Cyrillic small letter yu + combining acute accent
my $grapheme = "\x{44E}\x{301}";

say graph_length($grapheme);          # 1
say code_length($grapheme);           # 2
say byte_length($grapheme, 'UTF-8');  # 4

DESCRIPTION

This module provides Unicode-aware versions of Perl’s built-in string functions, tailored to work on grapheme clusters as opposed to code points or bytes.

FUNCTIONS

Functions may each be exported explicitly, or by using the :all tag for everything or the :length tag for the length functions.

graph_length($string)

Returns the length of the given string in grapheme clusters. This is the closest to the number of “characters” that many people would count on a printed string.

code_length($string)
code_length($string, $normal_form)

Returns the length of the given string in code points. This is likely the number of “characters” that many programmers and programming languages would count in a string. If the optional Unicode normalization form is supplied, the length will be of the string as if it had been normalized to that form.

Valid normalization forms are C or NFC, D or NFD, KC or NFKC, and KD or NFKD.

byte_length($string)
byte_length($string, $encoding)
byte_length($string, $encoding, $normal_form)

Returns the length of the given string in bytes, as if it were encoded using the specified encoding or UTF-8 if no encoding is supplied. If the optional Unicode normalization form is supplied, the length will be of the string as if it had been normalized to that form.

graph_chop($string)

Returns the given string with the last grapheme cluster chopped off. Does not modify the original value, unlike the built-in chop.

graph_reverse($string)

Returns the given string value with all grapheme clusters in the opposite order.

TODO

graph_substr, graph_index, graph_rindex

SEE ALSO

Unicode::GCString, String::Multibyte, Perl6::Str, http://perlcabal.org/syn/S32/Str.html

AUTHOR

Nick Patch <patch@cpan.org>

COPYRIGHT AND LICENSE

© 2011–2012 Nick Patch

This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.