The London Perl and Raku Workshop takes place on 26th Oct 2024. If your company depends on Perl, please consider sponsoring and/or attending.


Muldis::D::Ext::Text - Muldis D extension for character string data types and operators


This document is Muldis::D::Ext::Text version 0.98.0.


This document is part of the Muldis D language specification, whose root document is Muldis::D; you should read that root document before you read this one, which provides subservient details.


Muldis D has a mandatory core set of system-defined (eternally available) entities, which is referred to as the Muldis D core or the core; they are the minimal entities that all Muldis D implementations need to provide; they are mutually self-describing and are used to bootstrap the language; any entities outside the core, called Muldis D extensions, are non-mandatory and are defined in terms of the core or each other, but the reverse isn't true.

This current Text document describes the system-defined Muldis D Text Extension, which consists of character string data types and operators, essentially all the generic ones that a typical programming language should have, but for the bare minimum needed for bootstrapping Muldis D, which are defined in the language core instead.

This current document does not describe the polymorphic operators that all types, or some types including core types, have defined over them; said operators are defined once for all types in Muldis::D::Core.

This documentation is pending.


These functions implement commonly used character string operations.


function sys.std.Text.catenation (Text <-- $topic? : array_of.Text)

This function results in the catenation of the N element values of its argument; it is a reduction operator that recursively takes each consecutive pair of input values and catenates (which is associative) them together until just one is left, which is the result. If topic has zero values, then catenate results in the empty string value, which is the identity value for catenate. Note that this operation is also known as T~.


function sys.std.Text.replication (Text <-- $topic : Text, $count : NNInt)

This function results in the catenation of count instances of topic. Note that this operation is also known as Tx.


function sys.std.Text.len_in_nfd_codes (NNInt <-- $topic : Text)

This function results in the length of its argument in Unicode canonical decomposed normal form (NFD) abstract codepoints, or in other words, in the actual length of the argument since Muldis D explicitly works natively at the abstract codepoint abstraction level.


function sys.std.Text.len_in_graphs (NNInt <-- $topic : Text)

This function results in the length of its argument in language-independent graphemes.


function sys.std.Text.has_substr (Bool <-- $look_in : Text, $look_for : Text, $fixed_start? : Bool, $fixed_end? : Bool)

This function results in Bool:true iff its look_for argument is a substring of its look_in argument as per the optional fixed_start and fixed_end constraints, and Bool:false otherwise. If fixed_start or fixed_end are Bool:true, then look_for must occur right at the start or end, respectively, of look_in in order for contains to result in Bool:true; if either flag is Bool:false, its additional constraint doesn't apply. Each of the fixed_[start|end] parameters is optional and defaults to Bool:false if no explicit argument is given to it. Note that has_substr will handle the common special cases of SQL's "LIKE" operator for patterns like ['foo', '%foo', 'foo%', '%foo%'], but see also the is_like function which provides the full generality of SQL's "LIKE", such as 'foo%bar%baz'.


function sys.std.Text.has_not_substr (Bool <-- $look_in : Text, $look_for : Text, $fixed_start? : Bool, $fixed_end? : Bool)

This function is exactly the same as sys.std.Text.has_substr except that it results in the opposite boolean value when given the same arguments.


These functions implement commonly used text normalization operations which are relatively simple or whose details are fully specified by the Unicode standard; examples are folding letters to lower or upper case, removing combining characters like accent marks and other diacritics from base letters, or removing or normalizing whitespace, or that convert text from a larger to a smaller character repertoire such as to ASCII. By contrast, operations such as stemming or removing common words or expanding abbreviations are not done by these functions and are best implemented by a third party language extension or library. You can use these functions as a basis for making comparison or ranking or collation operators that ignore some distinctions between values such as their case or accents, such as to do case-insensitive or accent-insensitive or whitespace-insensitive matching or indexing or sorting; the actual system-defined matching operators are still sensitive to case et al, but you can pretend they're not by having them work with the results of these normalization functions rather than on the inputs to these functions. This is useful when you want to emulate the semantics of insensitive though possibly preserving systems over Muldis D.


function sys.std.Text.upper (Text <-- $topic : Text)

This function results in the normalization of its argument where any letters considered to be (small) lowercase are folded to (capital) uppercase.


function sys.std.Text.lower (Text <-- $topic : Text)

This function results in the normalization of its argument where any letters considered to be (capital) uppercase are folded to (small) lowercase.


function sys.std.Text.accents_stripped (Text <-- $topic : Text)

This function results in the normalization of its argument where any accent marks or diacritics are removed from letters, leaving just the primary letters.


function sys.std.Text.ASCII (Text <-- $topic : Text, $mark? : Text)

This function results in the normalization of its topic argument where any characters not in the 7-bit ASCII repertoire are stripped out, where each non-ASCII character is replaced with the common ASCII character string specified by its mark argument; if mark is the empty string, then the non-ASCII characters are simply stripped. This function is quite simple and does not do a smart replace with sequences of similar looking ASCII characters. The mark parameter is optional and defaults to the empty string if no explicit argument is given to it.


function sys.std.Text.trim (Text <-- $topic : Text)

This function results in the normalization of its argument where any leading or trailing whitespace characters are trimmed, but no other changes are made, including to any whitespace bounded by non-whitespace characters.


These functions implement commonly used operations for matching text against a pattern or performing substitutions of characters for others; included are both the functionality of SQL's simple "LIKE" pattern matching operator but also support for Perl 5's regular expressions and Perl 6's rules. All of these functions are case-sensitive et al as per is_identical unless explicitly given flags to do otherwise, where applicable; or just use them to search results of normalization functions if you need to. Note that Perl 5.10+ is also an inspiration such that its regular expression feature is algorithm-agnositic and can both be plugined with new algorithms or have multiple system-defined ones. Note that a lot of this section is still TODO, with several useful functions missing, or more complicated parts like the Perl pattern matching may be separated off into their own language extensions later.


function sys.std.Text.is_like (Bool <-- $look_in : Text, $look_for : Text, $escape? : Text)

This function results in Bool:true iff its look_in argument is matched by the pattern given in its look_for argument, and Bool:false otherwise. This function implements the full generalization of SQL's simple "LIKE" pattern matching operator. Any characters in look_for are matched literally except for the 2 wildcard characters _ (match any single character) and % (match any string of 0..N characters); the preceeding assumes that the escape argument is the empty string (or is missing). If escape is a character, then that character is also special and its lone occurrence in look_for will no longer match itself as per the 2 wildcard characters; rather it will be used in look_for to indicate when the pattern wishes to match a literal _ or % or the escape character itself literally. For example, if \ is used as the escape character, then you use \_, \%, \\ to match the literal wildcard characters or itself, respectively. Note that this operation is also known as is match using like or like.


function sys.std.Text.is_not_like (Bool <-- $look_in : Text, $look_for : Text, $escape? : Text)

This function is exactly the same as sys.std.Text.is_like except that it results in the opposite boolean value when given the same arguments; it implements SQL's "NOT LIKE". Note that this operation is also known as is not match using like or !like or not-like.


Go to Muldis::D for the majority of distribution-internal references, and Muldis::D::SeeAlso for the majority of distribution-external references.


Darren Duncan (


This file is part of the formal specification of the Muldis D language.

Muldis D is Copyright © 2002-2009, Muldis Data Systems, Inc.

See the LICENSE AND COPYRIGHT of Muldis::D for details.


The TRADEMARK POLICY in Muldis::D applies to this file too.


The ACKNOWLEDGEMENTS in Muldis::D apply to this file too.