NAME
Unicode::Util - Unicode-aware versions of built-in Perl functions
VERSION
This document describes Unicode::Util version 0.06.
SYNOPSIS
use Unicode::Util qw( graph_length code_length byte_length );
# grapheme cluster ю́: Cyrillic small letter yu + combining acute accent
my $grapheme = "\x{44E}\x{301}";
say graph_length($grapheme); # 1
say code_length($grapheme); # 2
say byte_length($grapheme, 'UTF-8'); # 4
DESCRIPTION
This module provides Unicode-aware versions of Perl’s built-in string functions, tailored to work on grapheme clusters as opposed to code points or bytes.
FUNCTIONS
Functions may each be exported explicitly, or by using the :all
tag for everything or the :length
tag for the length functions.
- graph_length($string)
-
Returns the length of the given string in grapheme clusters. This is the closest to the number of “characters” that many people would count on a printed string.
- code_length($string)
- code_length($string, $normal_form)
-
Returns the length of the given string in code points. This is likely the number of “characters” that many programmers and programming languages would count in a string. If the optional Unicode normalization form is supplied, the length will be of the string as if it had been normalized to that form.
Valid normalization forms are
C
orNFC
,D
orNFD
,KC
orNFKC
, andKD
orNFKD
. - byte_length($string)
- byte_length($string, $encoding)
- byte_length($string, $encoding, $normal_form)
-
Returns the length of the given string in bytes, as if it were encoded using the specified encoding or UTF-8 if no encoding is supplied. If the optional Unicode normalization form is supplied, the length will be of the string as if it had been normalized to that form.
- graph_chop($string)
-
Returns the given string with the last grapheme cluster chopped off. Does not modify the original value, unlike the built-in
chop
. - graph_reverse($string)
-
Returns the given string value with all grapheme clusters in the opposite order.
TODO
graph_substr
, graph_index
, graph_rindex
SEE ALSO
Unicode::GCString, String::Multibyte, Perl6::Str, http://perlcabal.org/syn/S32/Str.html
AUTHOR
Nick Patch <patch@cpan.org>
COPYRIGHT AND LICENSE
© 2011–2012 Nick Patch
This library is free software; you can redistribute it and/or modify it under the same terms as Perl itself.