TITLE
Synopsis_2 - Bits and Pieces
AUTHOR
Larry Wall <larry@wall.org>
VERSION
Maintainer: Larry Wall <larry@wall.org>
Date: 10 Aug 2004
Last Modified: 5 Dec 2007
Number: 2
Version: 121
This document summarizes Apocalypse 2, which covers small-scale lexical items and typological issues. (These Synopses also contain updates to reflect the evolving design of Perl 6 over time, unlike the Apocalypses, which are frozen in time as "historical documents". These updates are not marked--if a Synopsis disagrees with its Apocalypse, assume the Synopsis is correct.)
One-pass parsing
To the extent allowed by sublanguages' parsers, Perl is parsed using a one-pass, predictive parser. That is, lookahead of more than one "longest token" is discouraged. The currently known exceptions to this are where the parser must:
Locate the end of interpolated expressions that begin with a sigil and might or might not end with brackets.
Recognize that a reduce operator is not really beginning a
[...]
composer.
Lexical Conventions
In the abstract, Perl is written in Unicode, and has consistent Unicode semantics regardless of the underlying text representations.
Perl can count Unicode line and paragraph separators as line markers, but that behavior had better be configurable so that Perl's idea of line numbers matches what your editor thinks about Unicode lines.
Unicode horizontal whitespace is counted as whitespace, but it's better not to use thin spaces where they will make adjoining tokens look like a single token. On the other hand, Perl doesn't use indentation as syntax, so you are free to use any whitespace anywhere that whitespace makes sense. Comments always count as whitespace.
For some syntactic purposes, Perl distinguishes bracketing characters from non-bracketing. Bracketing characters are defined as any Unicode characters with either bidirectional mirrorings or Ps/Pe properties.
In practice, though, you're safest using matching characters with Ps/Pe properties, though ASCII angle brackets are a notable exception, since they're bidirectional but not in the Ps/Pe set.
Characters with no corresponding closing character do not qualify as opening brackets. This includes the second section of the Unicode BidiMirroring data table, as well as
U+201A
andU+201E
.If a character is already used in Ps/Pe mappings, then any entry in BidiMirroring is ignored (both forward and backward mappings). For any given Ps character, the next Pe codepoint (in numerical order) is assumed to be its matching character even if that is not what you might guess using left-right symmetry. Therefore
U+298D
maps toU+298E
, notU+2990
, andU+298F
maps toU+2990
, notU+298E
. NeitherU+298E
norU+2990
are valid bracket openers, despite having reverse mappings in the BidiMirroring table.The
U+301D
codepoint has two closing alternatives,U+301E
andU+301F
; Perl 6 only recognizes the one with lower code point number,U+301E
, as the closing brace. This policy also applies to new one-to-many mappings introduced in the future.
Whitespace and Comments
POD sections may be used reliably as multiline comments in Perl 6. Unlike in Perl 5, POD syntax now requires that
=begin comment
and=end comment
delimit a POD block correctly without the need for=cut
. (In fact,=cut
is now gone.) The format name does not have to becomment
-- any unrecognized format name will do to make it a comment. (However, bare=begin
and=end
probably aren't good enough, because all comments in them will show up in the formatted output.)We have single paragraph comments with
=for comment
as well. That lets=for
keep its meaning as the equivalent of a=begin
and=end
combined. As with=begin
and=end
, a comment started in code reverts to code afterwards.Since there is a newline before the first
=
, the POD form of comment counts as whitespace equivalent to a newline.Except within a string literal, a
#
character always introduces a comment in Perl 6. There are two forms of comment based on#
. Embedded comments require the#
to be followed by one or more opening bracketing characters.All other uses of
#
are interpreted as single-line comments that work just as in Perl 5, starting with a#
character and ending at the subsequent newline. They count as whitespace equivalent to newline for purposes of separation. Unlike in Perl 5,#
may not be used as the delimiter in quoting constructs.Embedded comments are supported as a variant on quoting syntax, introduced by
#
plus any user-selected bracket characters (as defined in "Lexical Conventions" above):say #( embedded comment ) "hello, world!"; $object\#{ embedded comments }.say; $object\ #「 embedded comments 」.say;
Brackets may be nested, following the same policy as ordinary quote brackets.
There must be no space between the
#
and the opening bracket character. (There may be the visual appearance of space for some double-wide characters, however, such as the corner quotes above.)An embedded comment is not allowed as the first thing on the line.
#sub foo # line-end comment #{ # ILLEGAL, syntax error # ... #}
If you wish to have a comment there, you must disambiguate it to either an embedded comment or a line-end comment. You can put a space in front of it to make it an embedded comment:
#sub foo # line end comment #{ # okay, comment ... # extends } # to here
Or you can put something other than a single
#
to make it a line-end comment. Therefore, if you are commenting out a block of code using the line-comment form, we recommend that you use##
, or#
followed by some whitespace, preferably a tab to keep any tab formatting consistent:##sub foo ##{ # okay ## ... ##} # sub foo # { # okay # ... # } # sub foo # { # okay # ... # }
However, it's often better to use pod comments because they are implicitly line-oriented. And if you have an intelligent syntax highlighter that will mark pod comments in a different color, there's less visual need for a
#
on every line.For all quoting constructs that use user-selected brackets, you can open with multiple identical bracket characters, which must by closed by the same number of closing brackets. Counting of nested brackets applies only to pairs of brackets of the same length as the opening brackets:
say #{{ This comment contains unmatched } and { { { { (ignored) Plus a nested {{ ... }} pair (counted) }} q<< <<woot>> >> # says " <<woot>> "
Note however that bare circumfix or postcircumfix
<<...>>
is not a user-selected bracket, but the ASCII variant of the«...»
interpolating word list. Only#
and theq
-style quoters (includingm
,s
,tr
, andrx
) enable subsequent user-selected brackets.Some languages such as C allow you to escape newline characters to combine lines. Other languages (such as regexes) allow you to backslash a space character for various reasons. Perl 6 generalizes this notion to any kind of whitespace. Any contiguous whitespace (including comments) may be hidden from the parser by prefixing it with
\
. This is known as the "unspace". An unspace can suppress any of several whitespace dependencies in Perl. For example, since Perl requires an absence of whitespace between a noun and a postfix operator, using unspace lets you line up postfix operators:%hash\ .{$key} @array\ .[$ix] $subref\.($arg)
As a special case to support the use above, a backslashed dot where a postfix is expected is considered a degenerate form of unspace. Note that whitespace is not allowed before that, hence
$subref \.($arg)
is a syntax error (two terms in a row). And
foo \.($arg)
will be parsed as a list operator with an argument:
foo(\$_.($arg))
However, other forms of unspace may usefully be preceded by whitespace. (Unary uses of backslash may therefore never be followed by whitespace or they would be taken as an unspace.)
Other postfix operators may also make use of unspace:
$number\ ++; $number\ --; 1+3\ i; $object\ .say(); $object\#{ your ad here }.say
Another normal use of a you-don't-see-this-space is typically to put a dotted postfix on the next line:
$object\ # comment .say $object\#[ comment ].say $object\ .say
But unspace is mainly about language extensibility: it lets you continue the line in any situation where a newline might confuse the parser, regardless of your currently installed parser. (Unless, of course, you override the unspace rule itself...)
Although we say that the unspace hides the whitespace from the parser, it does not hide whitespace from the lexer. As a result, unspace is not allowed within a token. Additionally, line numbers are still counted if the unspace contains one or more newlines. A
#
following such a newline is always an end-of-line comment, as described above. Since Pod chunks count as whitespace to the language, they are also swallowed up by unspace. Heredoc boundaries are suppressed, however, so you can split excessively long heredoc intro lines like this:ok(q:to'CODE', q:to'OUTPUT', \ "Here is a long description", \ # --more-- todo(:parrøt<0.42>, :dötnet<1.2>)); ... CODE ... OUTPUT
To the heredoc parser that just looks like:
ok(q:to'CODE', q:to'OUTPUT', "Here is a long description", todo(:parrøt<0.42>, :dötnet<1.2>)); ... CODE ... OUTPUT
Note that this is one of those cases in which it is fine to have whitespace before the unspace. (Note also that the example above is not meant to spec how the test suite works. :)
An unspace may contain a comment, but a comment may not contain an unspace. In particular, end-of-line comments do not treat backslash as significant. If you say:
#\ (...
it is an end-of-line comment, not an embedded comment. Write:
\ #( ... )
to mean the other thing.
In general, whitespace is optional in Perl 6 except where it is needed to separate constructs that would be misconstrued as a single token or other syntactic unit. (In other words, Perl 6 follows the standard longest-token principle, or in the cases of large constructs, a prefer shifting to reducing principle. See "Grammatical Categories" below for more on how a Perl program is analyzed into tokens.)
This is an unchanging deep rule, but the surface ramifications of it change as various operators and macros are added to or removed from the language, which we expect to happen because Perl 6 is designed to be a mutable language. In particular, there is a natural conflict between postfix operators and infix operators, either of which may occur after a term. If a given token may be interpreted as either a postfix operator or an infix operator, the infix operator requires space before it. Postfix operators may never have intervening space, though they may have an intervening dot. If further separation is desired, an embedded comment may be used as described above, as long as no whitespace occurs outside the embedded comment.
For instance, if you were to add your own
infix:<++>
operator, then it must have space before it. The normal autoincrementingpostfix:<++>
operator may never have space before it, but may be written in any of these forms:$x++ $x.++ $x\ .++ $x\#( comment ).++ $x\#((( comment ))).++ $x\ .++ $x\ # comment # inside unspace .++ $x\ # comment # inside unspace ++ # (but without the optional postfix dot) $x\#『 comment more comment 』.++ $x\#[ comment 1 comment 2 =begin podstuff whatever (pod comments ignore current parser state) =end podstuff comment 3 ].++
A consequence of the postfix rule is that (except when delimiting a quote or terminating an unspace) a dot with whitespace in front of it is always considered a method call on
$_
where a term is expected. If a term is not expected at this point, it is a syntax error. (Unless, of course, there is an infix operator of that name beginning with dot. You could, for instance, define a Fortranlyinfix:<.EQ.>
if the fit took you. But you'll have to be sure to always put whitespace in front of it, or it would be interpreted as a postfix method call instead.)For example,
foo .method
and
foo .method
will always be interpreted as
foo $_.method
but never as
foo.method
Use some variant of
foo\ .method
if you mean the postfix method call.
One consequence of all this is that you may no longer write a Num as
42.
with just a trailing dot. You must instead say either42
or42.0
. In other words, a dot following a number can only be a decimal point if the following character is a digit. Otherwise the postfix dot will be taken to be the start of some kind of method call syntax, whether long-dotty or not. (The.123
form with a leading dot is still allowed however when a term is expected, and is equivalent to0.123
rather than$_.123
.)
Built-In Data Types
In support of OO encapsulation, there is a new fundamental datatype: P6opaque. External access to opaque objects is always through method calls, even for attributes.
Perl 6 has an optional type system that helps you write safer code that performs better. The compiler is free to infer what type information it can from the types you supply, but will not complain about missing type information unless you ask it to.
Types are officially compared using name equivalence rather than structural equivalence. However, we're rather liberal in what we consider a name. For example, the name includes the version and authority associated with the module defining the type (even if the type itself is "anonymous"). Beyond that, when you instantiate a parametric type, the arguments are considered part of the "long name" of the resulting type, so one
Array of Int
is equivalent to anotherArray of Int
. (Another way to look at it is that the type instantiation "factory" is memoized.) Typename aliases are considered equivalent to the original type.This name equivalence of parametric types extends only to parameters that can be considered immutable (or that at least can have an immutable snapshot taken of them). Two distinct classes are never considered equivalent even if they have the same attributes because classes are not considered immutable.
Perl 6 supports the notion of properties on various kinds of objects. Properties are like object attributes, except that they're managed by the individual object rather than by the object's class.
According to S12, properties are actually implemented by a kind of mixin mechanism, and such mixins are accomplished by the generation of an individual anonymous class for the object (unless an identical anonymous class already exists and can safely be shared).
Properties applied to objects constructed at compile-time, such as variables and classes, are also called traits. Traits cannot be changed at run-time. Changes to run-time properties are done via mixin instead, so that the compiler can optimize based on declared traits.
Perl 6 is an OO engine, but you're not generally required to think in OO when that's inconvenient. However, some built-in concepts such as filehandles will be more object-oriented in a user-visible way than in Perl 5.
A variable's type is a constraint indicating what sorts of values the variable may contain. More precisely, it's a promise that the object or objects contained in the variable are capable of responding to the methods of the indicated "role". See S12 for more about roles.
# $x can contain only Int objects my Int $x;
A variable may itself be bound to a container type that specifies how the container works, without specifying what kinds of things it contains.
# $x is implemented by the MyScalar class my $x is MyScalar;
Constraints and container types can be used together:
# $x can contain only Int objects, # and is implemented by the MyScalar class my Int $x is MyScalar;
Note that
$x
is also initialized to::Int
. See below for more on this.my Dog $spot
by itself does not automatically call aDog
constructor. It merely assigns an undefinedDog
prototype object to$spot
:my Dog $spot; # $spot is initialized with ::Dog my Dog $spot = Dog; # same thing $spot.defined; # False say $spot; # "Dog"
Any class name used as a value by itself is an undefined instance of that class's prototype, or protoobject. See S12 for more on that. (Any type name in rvalue context is parsed as a list operator indicating a typecast, but an argumentless one of these degenerates to a typecast of undef, producing the protoobject.)
To get a real
Dog
object, call a constructor method such asnew
:my Dog $spot .= new; my Dog $spot = $spot.new; # .= is rewritten into this
You can pass in arguments to the constructor as well:
my Dog $cerberus .= new(heads => 3); my Dog $cerberus = $cerberus.new(heads => 3); # same thing
If you say
my int @array is MyArray;
you are declaring that the elements of
@array
are native integers, but that the array itself is implemented by theMyArray
class. Untyped arrays and hashes are still perfectly acceptable, but have the same performance issues they have in Perl 5.To get the number of elements in an array, use the
.elems
method. You can also ask for the total string length of an array's elements, in bytes, codepoints or graphemes, using these methods.bytes
,.codes
or.graphs
respectively on the array. The same methods apply to strings as well.There is no
.length
method for either arrays or strings, becauselength
does not specify a unit.Built-in object types start with an uppercase letter. This includes immutable types (e.g.
Int
,Num
,Complex
,Rat
,Str
,Bit
,Regex
,Set
,Junction
,Code
,Block
,List
,Seq
), as well as mutable (container) types, such asScalar
,Array
,Hash
,Buf
,Routine
,Module
, etc.Non-object (native) types are lowercase:
int
,num
,complex
,rat
,buf
,bit
. Native types are primarily intended for declaring compact array storage. However, Perl will try to make those look like their corresponding uppercase types if you treat them that way. (In other words, it does autoboxing. Note, however, that sometimes repeated autoboxing can slow your program more than the native type can speed it up.)Some object types can behave as value types. Every object can produce a "WHICH" value that uniquely identifies the object for hashing and other value-based comparisons. Normal objects just use their address in memory, but if a class wishes to behave as a value type, it can define a
.WHICH
method that makes different objects look like the same object if they happen to have the same contents.Variables with non-native types can always contain undefined values, such as
Undef
,Whatever
andFailure
objects. See S04 for more about failures (i.e. unthrown exceptions):my Int $x = undef; # works
Variables with native types do not support undefinedness: it is an error to assign an undefined value to them:
my int $y = undef; # dies
Conjecture: num might support the autoconversion of undef to NaN, since the floating-point form can represent this concept. Might be better to make that conversion optional though, so that the rocket designer can decide whether to self-destruct immediately or shortly thereafter.
Variables of non-native types start out containing an undefined value unless explicitly initialized to a defined value.
Every object supports a
HOW
function/method that returns the metaclass instance managing it, regardless of whether the object is defined:'x'.HOW.methods; # get available methods for strings Str.HOW.methods; # same thing with the prototype object Str HOW(Str).methods; # same thing as function call 'x'.methods; # this is likely an error - not a meta object Str.methods; # same thing
(For a prototype system (a non-class-based object system), all objects are merely managed by the same meta object.)
Perl 6 intrinsically supports big integers and rationals through its system of type declarations.
Int
automatically supports promotion to arbitrary precision, as well as holdingInf
andNaN
values. Note thatInt
assumes 2's complement arithmetic, so+^1 == -2
is guaranteed. (Nativeint
operations need not support this on machines that are not natively 2's complement. You must convert to and fromInt
to do portable bitops on such ancient hardware.)(
Num
may support arbitrary-precision floating-point arithmetic, but is not required to unless we can do so portably and efficiently.Num
must support the largest native floating point format that runs at full speed.)Rat
supports arbitrary precision rational arithmetic. However, dividing twoInt
objects usinginfix:</>
produces a fraction ofNum
type, not a ratio. You can produce a ratio by usinginfix:<div>
on two integers instead.Lower-case types like
int
andnum
imply the native machine representation for integers and floating-point numbers, respectively, and do not promote to arbitrary precision, though larger representations are always allowed for temporary values. Unless qualified with a number of bits,int
andnum
types represent the largest native integer and floating-point types that run at full speed.Numeric values in untyped variables use
Int
andNum
semantics rather thanint
andnum
.Perl 6 should by default make standard IEEE floating point concepts visible, such as
Inf
(infinity) andNaN
(not a number). Within a lexical scope, pragmas may specify the nature of temporary values, and how floating point is to behave under various circumstances. All IEEE modes must be lexically available via pragma except in cases where that would entail heroic efforts to bypass a braindead platform.The default floating-point modes do not throw exceptions but rather propagate Inf and NaN. The boxed object types may carry more detailed information on where overflow or underflow occurred. Numerics in Perl are not designed to give the identical answer everywhere. They are designed to give the typical programmer the tools to achieve a good enough answer most of the time. (Really good programmers may occasionally do even better.) Mostly this just involves using enough bits that the stupidities of the algorithm don't matter much.
A
Str
is a Unicode string object. There is no corresponding nativestr
type. However, since aStr
object may fill multiple roles, we say that aStr
keeps track of its minimum and maximum Unicode abstraction levels, and plays along nicely with the current lexical scope's idea of the ideal character, whether that is bytes, codepoints, graphemes, or characters in some language. For all builtin operations, allStr
positions are reported as position objects, not integers. TheseStrPos
objects point into a particular string at a particular location independent of abstraction level, either by tracking the string and position directly, or by generating an abstraction-level independent representation of the offset from the beginning of the string that will give the same results if applied to the same string in any context. This is assuming the string isn't modified in the meanwhile; aStrPos
is not a "marker" and is not required to follow changes to a mutable string. For instance, if you ask for the positions of matches done by a substitution, the answers are reported in terms of the original string (which may now be inaccessible!), not as positions within the modified string. (However, if you use.pos
on the modified string, it will report the position of the end of the substitution in terms of the new string.)The subtraction of two
StrPos
objects gives aStrLen
object, which is also not an integer, because the string between two positions also has multiple integer interpretations depending on the units. A givenStrLen
may know that it represents 18 bytes, 7 codepoints, 3 graphemes, and 1 letter in Malayalam, but it might only know this lazily because it actually just hangs onto the twoStrPos
endpoints within the string that in turn may or may not just lazily point into the string. (The lazy implementation ofStrLen
is much like aRange
object in that respect.)If you use integers as arguments where position objects are expected, it will be assumed that you mean the units of the current lexically scoped Unicode abstraction level. (Which defaults to graphemes.) Otherwise you'll need to coerce to the proper units:
substr($string, 42.as(Bytes), 1.as(ArabicChars))
Of course, such a dimensional number will fail if used on a string that doesn't provide the appropriate abstraction level.
If a
StrPos
orStrLen
is forced into a numeric context, it will assume the units of the current Unicode abstraction level. It is erroneous to pass such a non-dimensional number to a routine that would interpret it with the wrong units.A
Buf
is a stringish view of an array of integers, and has no Unicode or character properties without explicit conversion to some kind ofStr
. (Abuf
is the native counterpart.) Typically it's an array of bytes serving as a buffer. Bitwise operations on aBuf
treat the entire buffer as a single large integer. Bitwise operations on aStr
generally fail unless theStr
in question can provide an abstractBuf
interface somehow. Coercion toBuf
should generally invalidate theStr
interface. As a generic typeBuf
may be instantiated as (or bound to) any ofbuf8
,buf16
, orbuf32
(or to any type that provides the appropriateBuf
interface), but when used to create a bufferBuf
defaults tobuf8
.Unlike
Str
types,Buf
types prefer to deal with integer string positions, and map these directly to the underlying compact array as indices. That is, these are not necessarily byte positions--an integer position just counts over the number of underlying positions, where one position means one cell of the underlying integer type. Builtin string operations onBuf
types return integers and expect integers when dealing with positions. As a limiting case,buf8
is just an old-school byte string, and the positions are byte positions. Note, though, that if you remap a section ofbuf32
memory to bebuf8
, you'll have to multiply all your positions by 4.Ordinarily a term beginning with
*
indicates a global function or type name, but by itself, the*
term captures the notion of "Whatever", which is applied lazily by whatever operator it is an argument to. Generally it can just be thought of as a "glob" that gives you everything it can in that argument position. For instance:if $x ~~ 1..* {...} # if 1 <= $x <= +Inf my ($a,$b,$c) = "foo" xx *; # an arbitrary long list of "foo" if /foo/ ff * {...} # a latching flipflop @slice = @x[*;0;*]; # any Int @slice = %x{*;'foo'}; # any keys in domain of 1st dimension @array[*] # flattens, unlike @array[] (*, *, $x) = (1, 2, 3); # skip first two elements # (same as lvalue "undef" in Perl 5)
Whatever
is an undefined prototype object derived fromAny
. As a type it is abstract, and may not be instantiated as a defined object. If for a particular MMD dispatch, nothing in the MMD system claims it, it dispatches to as anAny
with an undefined value, and usually blows up constructively. If you saysay 1 + *;
you should probably not expect it to yield a reasonable answer, unless you think an exception is reasonable. Since the
Whatever
object is effectively immutable, the optimizer is free to recognize*
and optimize in the context of what operator it is being passed to.A variant of
*
is the**
term. It is generally understood to be a multidimension form of*
when that makes sense.Other uses for
*
will doubtless suggest themselves over time. These can be given meaning via the MMD system, if not the compiler. In general aWhatever
should be interpreted as maximizing the degrees of freedom in a dwimmey way, not as a nihilistic "don't care anymore--just shoot me".
Native types
Values with these types autobox to their uppercase counterparts when you treat them as objects:
bit single native bit
int native signed integer
uint native unsigned integer (autoboxes to Int)
buf native buffer (finite seq of native ints or uints, no Unicode)
num native floating point
complex native complex number
bool native boolean
Since native types cannot represent Perl's concept of undefined values, in the absence of explicit initialization, native floating-point types default to NaN, while integer types (including bit
) default to 0. The complex type defaults to NaN + NaN.i. A buf type of known size defaults to a sequence of 0 values. If any native type is explicitly initialized to *
(the Whatever
type), it is left uninitialized.
If a buf type is initialized with a Unicode string value, the string is decomposed into Unicode codepoints, and each codepoint shoved into an integer element. If the size of the buf type is not specified, it takes its length from the initializing string. If the size is specified, the initializing string is truncated or 0-padded as necessary. If a codepoint doesn't fit into a buf's integer type, a parse error is issued if this can be detected at compile time; otherwise a warning is issued at run time and the overflowed buffer element is filled with an appropriate replacement character, either U+FFFD
(REPLACEMENT CHARACTER) if the element's integer type is at least 16 bits, or U+007f
(DELETE) if the larger value would not fit. If any other conversion is desired, it must be specified explicitly. In particular, no conversion to UTF-8 or UTF-16 is attempted; that must be specified explicitly. (As it happens, conversion to a buf type based on 32-bit integers produces valid UTF-32 in the native endianness.)
Undefined types
These can behave as values or objects of any class, except that defined
always returns false. One can create them with the built-in undef
and fail
functions. (See S04 for how failures are handled.)
Undef Undefined (variants serve as prototype objects of classes)
Whatever Wildcard (like undef, but subject to do-what-I-mean via MMD)
Failure Failure (lazy exceptions, thrown if not handled properly)
Whenever you declare any kind of type, class, module, or package, you're automatically declaring a undefined prototype value with the same name.
Immutable types
Objects with these types behave like values, i.e. $x === $y
is true if and only if their types and contents are identical (that is, if $x.WHICH
eqv $y.WHICH
).
Bit Perl single bit (allows traits, aliasing, undef, etc.)
Int Perl integer (allows Inf/NaN, arbitrary precision, etc.)
Str Perl string (finite sequence of Unicode characters)
Num Perl number
Complex Perl complex number
Bool Perl boolean
Exception Perl exception
Code Base class for all executable objects
Block Executable objects that have lexical scopes
List Lazy Perl list (composed of immutables and iterators)
Seq Completely evaluated (hence immutable) sequence
Range A pair of Ordered endpoints; gens immutables when iterated
Set Unordered collection of values that allows no duplicates
Bag Unordered collection of values that allows duplicates
Junction Set with additional behaviors
Pair A single key-to-value association
Mapping Set of Pairs with no duplicate keys
Signature Function parameters (left-hand side of a binding)
Capture Function call arguments (right-hand side of a binding)
Blob An undifferentiated mass of bits
Mutable types
Objects with these types have distinct .WHICH
values that do not change even if the object's contents change. (Routines are considered mutable because they can be wrapped in place.)
Scalar Perl scalar
Array Perl array
Hash Perl hash
KeyHash Perl hash that autodeletes values matching default
KeySet KeyHash of Bool (does Set in list/array context)
KeyBag KeyHash of UInt (does Bag in list/array context)
Buf Perl buffer (a stringish array of memory locations)
IO Perl filehandle
Routine Base class for all wrappable executable objects
Sub Perl subroutine
Method Perl method
Submethod Perl subroutine acting like a method
Macro Perl compile-time subroutine
Regex Perl pattern
Match Perl match, usually produced by applying a pattern
Package Perl 5 compatible namespace
Module Perl 6 standard namespace
Class Perl 6 standard class namespace
Role Perl 6 standard generic interface/implementation
Grammar Perl 6 pattern matching namespace
Any Perl 6 object (default parameter type, excludes Junction)
Object Perl 6 object (either Any or Junction)
A KeyHash
differs from a normal Hash
in how it handles default values. If the value of a KeyHash
element is set to the default value for the KeyHash
, the element is deleted. If undeclared, the default default for a KeyHash
is 0 for numeric types, False
for boolean types, and the null string for string and buffer types. A KeyHash
of a Object
type defaults to the undefined prototype for that type. More generally, the default default is whatever defined value an undef
would convert to for that value type. A KeyHash
of Scalar
deletes elements that go to either 0 or the null string. A KeyHash
also autodeletes keys for normal undef values (that is, those undefined values that do not contain an unthrown exception).
A KeySet
is a KeyHash
of booleans with a default of False
. If you use the Hash
interface and increment an element of a KeySet
its value becomes true (creating the element if it doesn't exist already). If you decrement the element it becomes false and is automatically deleted. When not used as a Hash
(that is, when used as an Array
or list or Set
object) a KeySet
behaves as a Set
of its keys. (Since the only possible value of a KeySet
is the True
value, it need not be represented in the actual implementation with any bits at all.)
A KeyBag
is a KeyHash
of UInt
with default of 0. If you use the Hash
interface and increment an element of a KeyBag
its value is increased by one (creating the element if it doesn't exist already). If you decrement the element the value is decreased by one; if the value goes to 0 the element is automatically deleted. When not used as a Hash
(that is, when used as an Array
or list or Bag
object) a KeyBag
behaves as a Bag
of its keys, with each key replicated the number of times specified by its corresponding value. (Use .kv
or .pairs
to suppress this behavior in list context.)
Value types
Explicit types are optional. Perl variables have two associated types: their "value type" and their "implementation type". (More generally, any container has an implementation type, including subroutines and modules.) The value type is stored as its of
property, while the implementation type of the container is just the object type of the container itself. The word returns
is allowed as an alias for of
.
The value type specifies what kinds of values may be stored in the variable. A value type is given as a prefix or with the of
keyword:
my Dog $spot;
my $spot of Dog;
In either case this sets the of
property of the container to Dog
.
Subroutines have a variant of the of
property, as
, that sets the as
property instead. The as
property specifies a constraint (or perhaps coercion) to be enforced on the return value (either by explicit call to return
or by implicit fall-off-the-end return). This constraint, unlike the of
property, is not advertised as the type of the routine. You can think of it as the implicit type signature of the (possibly implicit) return statement. It's therefore available for type inferencing within the routine but not outside it. If no as
type is declared, it is assumed to be the same as the of
type, if declared.
sub get_pet() of Animal {...} # of type, obviously
sub get_pet() returns Animal {...} # of type
our Animal sub get_pet() {...} # of type
sub get_pet() as Animal {...} # as type
A value type on an array or hash specifies the type stored by each element:
my Dog @pound; # each element of the array stores a Dog
my Rat %ship; # the value of each entry stores a Rat
The key type of a hash may be specified as a shape trait--see S09.
Implementation types
The implementation type specifies how the variable itself is implemented. It is given as a trait of the variable:
my $spot is Scalar; # this is the default
my $spot is PersistentScalar;
my $spot is DataBase;
Defining an implementation type is the Perl 6 equivalent to tying a variable in Perl 5. But Perl 6 variables are tied directly at declaration time, and for performance reasons may not be tied with a run-time tie
statement unless the variable is explicitly declared with an implementation type that does the Tieable
role.
However, package variables are always considered Tieable
by default. As a consequence, all named packages are also Tieable
by default. Classes and modules may be viewed as differently tied packages. Looking at it from the other direction, classes and modules that wish to be bound to a global package name must be able to do the Package
role.
Hierarchical types
A non-scalar type may be qualified, in order to specify what type of value each of its elements stores:
my Egg $cup; # the value is an Egg
my Egg @carton; # each elem is an Egg
my Array of Egg @box; # each elem is an array of Eggs
my Array of Array of Egg @crate; # each elem is an array of arrays of Eggs
my Hash of Array of Recipe %book; # each value is a hash of arrays of Recipes
Each successive of
makes the type on its right a parameter of the type on its left. Parametric types are named using square brackets, so:
my Hash of Array of Recipe %book;
actually means:
my Hash[of => Array[of => Recipe]] %book;
Because the actual variable can be hard to find when complex types are specified, there is a postfix form as well:
my Hash of Array of Recipe %book; # HoHoAoRecipe
my %book of Hash of Array of Recipe; # same thing
The as
form may be used in subroutines:
my sub get_book ($key) as Hash of Array of Recipe {...}
Alternately, the return type may be specified within the signature:
my sub get_book ($key --> Hash of Array of Recipe) {...}
There is a slight difference, insofar as the type inferencer will ignore a as
but pay attention to -->
or prefix type declarations, also known as the of
type. Only the inside of the subroutine pays attention to as
, and essentially coerces the return value to the indicated type, just as if you'd coerced each return expression.
You may also specify the of
type as the of
trait (with returns
allowed as a synonym):
my Hash of Array of Recipe sub get_book ($key) {...}
my sub get_book ($key) of Hash of Array of Recipe {...}
my sub get_book ($key) returns Hash of Array of Recipe {...}
Polymorphic types
Anywhere you can use a single type you can use a set of types, for convenience specifiable as if it were an "or" junction:
my Int|Str $error = $val; # can assign if $val~~Int or $val~~Str
Fancier type constraints may be expressed through a subtype:
subset Shinola of Any where {.does(DessertWax) and .does(FloorTopping)};
if $shimmer ~~ Shinola {...} # $shimmer must do both interfaces
Since the terms in a parameter could be viewed as a set of constraints that are implicitly "anded" together (the variable itself supplies type constraints, and where
clauses or tree matching just add more constraints), we relax this to allow juxtaposition of types to act like an "and" junction:
# Anything assigned to the variable $mitsy must conform
# to the type Fish and either the Cat or Dog type...
my Cat|Dog Fish $mitsy = new Fish but { int rand 2 ?? .does Cat
!! .does Dog };
Parameter types
Parameters may be given types, just like any other variable:
sub max (int @array is rw) {...}
sub max (@array of int is rw) {...}
Generic types
Within a declaration, a class variable (either by itself or following an existing type name) declares a new type name and takes its parametric value from the actual type of the parameter it is associated with. It declares the new type name in the same scope as the associated declaration.
sub max (Num ::X @array) {
push @array, X.new();
}
The new type name is introduced immediately, so two such types in the same signature must unify compatibly if they have the same name:
sub compare (Any ::T $x, T $y) {
return $x eqv $y;
}
Return types
On a scoped subroutine, a return type can be specified before or after the name. We call all return types "return types", but distinguish two kinds of return types, the as
type and the of
type, because the of
type is normally an "official" named type and declares the official interface to the routine, while the as
type is merely a constraint on what may be returned by the routine from the routine's point of view.
our sub lay as Egg {...} # as type
our Egg sub lay {...} # of type
our sub lay of Egg {...} # of type
our sub lay (--> Egg) {...} # of type
my sub hat as Rabbit {...} # as type
my Rabbit sub hat {...} # of type
my sub hat of Rabbit {...} # of type
my sub hat (--> Rabbit) {...} # of type
If a subroutine is not explicitly scoped, it belongs to the current namespace (module, class, grammar, or package), as if it's scoped with the our
scope modifier. Any return type must go after the name:
sub lay as Egg {...} # as type
sub lay of Egg {...} # of type
sub lay (--> Egg) {...} # of type
On an anonymous subroutine, any return type can only go after the sub
keyword:
$lay = sub as Egg {...}; # as type
$lay = sub of Egg {...}; # of type
$lay = sub (--> Egg) {...}; # of type
but you can use a scope modifier to introduce an of
prefix type:
$lay = my Egg sub {...}; # of type
$hat = my Rabbit sub {...}; # of type
Because they are anonymous, you can change the my
modifier to our
without affecting the meaning.
The return type may also be specified after a -->
token within the signature. This doesn't mean exactly the same thing as as
. The of
type is the "official" return type, and may therefore be used to do type inferencing outside the sub. The as
type only makes the return type available to the internals of the sub so that the return
statement can know its context, but outside the sub we don't know anything about the return value, as if no return type had been declared. The prefix form specifies the of
type rather than the as
type, so the return type of
my Fish sub wanda ($x) { ... }
is known to return an object of type Fish, as if you'd said:
my sub wanda ($x --> Fish) { ... }
not as if you'd said
my sub wanda ($x) as Fish { ... }
It is possible for the of
type to disagree with the as
type:
my Squid sub wanda ($x) as Fish { ... }
or equivalently,
my sub wanda ($x --> Squid) as Fish { ... }
This is not lying to yourself--it's lying to the world. Having a different inner type is useful if you wish to hold your routine to a stricter standard than you let on to the outside world, for instance.
Names and Variables
The
$Package'var
syntax is gone. Use$Package::var
instead.Perl 6 includes a system of sigils to mark the fundamental structural type of a variable:
$ scalar (object) @ ordered array % unordered hash (associative array) & code/rule/token/regex :: package/module/class/role/subset/enum/type/grammar @@ slice view of @
Within a declaration, the
&
sigil also declares the visibility of the subroutine name without the sigil within the scope of the declaration:my &func := sub { say "Hi" }; func; # calls &func
Within a signature or other declaration, the
::
sigil followed by an identifier marks a type variable that also declares the visibility of a package/type name without the sigil within the scope of the declaration. The first such declaration within a scope is assumed to be an unbound type, and takes the actual type of its associated argument. With subsequent declarations in the same scope the use of the sigil is optional, since the bare type name is also declared.A declaration nested within must not use the sigil if it wishes to refer to the same type, since the inner declaration would rebind the type. (Note that the signature of a pointy block counts as part of the inner block, not the outer block.)
Sigils indicate overall interface, not the exact type of the bound object. Different sigils imply different minimal abilities.
$x
may be bound to any object, including any object that can be bound to any other sigil. Such a scalar variable is always treated as a singular item in any kind of list context, regardless of whether the object is essentially composite or unitary. It will not automatically dereference to its contents unless placed explicitly in some kind of dereferencing context. In particular, when interpolating into list context,$x
never expands its object to anything other than the object itself as a single item, even if the object is a container object containing multiple items.@x
may be bound to an object of theArray
class, but it may also be bound to any object that does thePositional
role, such as aList
,Seq
,Range
,Buf
, orCapture
. ThePositional
role implies the ability to supportpostcircumfix:<[ ]>
.Likewise,
%x
may be bound to any object that does theAssociative
role, such asPair
,Mapping
,Set
,Bag
,KeyHash
, orCapture
. TheAssociative
role implies the ability to supportpostcircumfix:<{ }>
.&x
may be bound to any object that does theCallable
role, such as anyBlock
orRoutine
. TheCallable
role implies the ability to supportpostcircumfix:<( )>
.::x
may be bound to any object that does theAbstraction
role, such as a typename, package, module, class, role, grammar, or any other protoobject with.HOW
hooks. ThisAbstraction
role implies the ability to do various symbol table and/or typological manipulations which may or may not be supported by any given abstraction. Mostly though it just means that you want to give some abstraction an official name that you can then use later in the compilation without any sigil.In any case, the minimal container role implied by the sigil is checked at binding time at the latest, and may fail earlier (such as at compile time) if a semantic error can be detected sooner. If you wish to bind an object that doesn't yet do the appropriate role, you must either stick with the generic
$
sigil, or mix in the appropriate role before binding to a more specific sigil.An object is allowed to support both
Positional
andAssociative
. An object that does not supportPositional
may not be bound directly to@x
. However, any construct such as%x
that can interpolate the contents of such an object into list context can automatically construct a list value that may then be bound to an array variable. Subscripting such a list does not imply subscripting back into the original object.Unlike in Perl 5, you may no longer put whitespace between a sigil and its following name or construct.
Ordinary sigils indicate normally scoped variables, either lexical or package scoped. Oddly scoped variables include a secondary sigil (a twigil) that indicates what kind of strange scoping the variable is subject to:
$foo ordinary scoping $.foo object attribute accessor $^foo self-declared formal parameter $*foo global variable $+foo contextual variable $?foo compiler hint variable $=foo pod variable $<foo> match variable, short for $/{'foo'} $!foo explicitly private attribute (mapped to $foo though)
Most variables with twigils are implicitly declared or assumed to be declared in some other scope, and don't need a "my" or "our". Attribute variables are declared with
has
, though.Sigils are now invariant.
$
always means a scalar variable,@
an array variable, and%
a hash variable, even when subscripting. Variables such as@array
and%hash
in scalar context simply return themselves asArray
andHash
objects.In string contexts, container objects automatically stringify to appropriate (white-space separated) string values. In numeric contexts, the number of elements in the container is returned. In boolean contexts, a true value is returned if and only if there are any elements in the container.
To get a Perlish representation of any object, use the
.perl
method. Like theData::Dumper
module in Perl 5, the.perl
method will put quotes around strings, square brackets around list values, curlies around hash values, constructors around objects, etc., so that Perl can evaluate the result back to the same object.To get a formatted representation of any scalar value, use the
.fmt('%03d')
method to do an implicitsprintf
on the value.To format an array value separated by commas, supply a second argument:
.fmt('%03d', ', ')
. To format a hash value or list of pairs, include formats for both key and value in the first string:.fmt('%s: %s', "\n")
.Subscripts now consistently dereference the container produced by whatever was to their left. Whitespace is not allowed between a variable name and its subscript. However, there is a corresponding dot form of each subscript (
@foo.[1]
and%bar.{'a'}
). Constant string subscripts may be placed in angles, so%bar.{'a'}
may also be written as%bar<a>
or%bar.<a>
.Slicing is specified by the nature of the subscript, not by the sigil.
The context in which a subscript is evaluated is no longer controlled by the sigil either. Subscripts are always evaluated in list context.
If you need to force inner context to scalar, we now have convenient single-character context specifiers such as + for numbers and ~ for strings:
@x[f()] = g(); # list context for f() and g() @x[f()] = +g(); # list context for f(), scalar context for g() @x[+f()] = g(); # scalar context for f() and g() # -- see S03 for "SIMPLE" lvalues @x[f()] = @y[g()]; # list context for f() and g() @x[f()] = +@y[g()]; # list context for f() and g() @x[+f()] = @y[g()]; # scalar context for f(), list context for g() @x[f()] = @y[+g()]; # list context for f(), scalar context for g()
There is a need to distinguish list assignment from list binding. List assignment works exactly as it does in Perl 5, copying the values. There's a new
:=
binding operator that lets you bind names to Array and Hash objects without copying, in the same way as subroutine arguments are bound to formal parameters. See S06 for more about binding.An argument list may be captured into an object with backslashed parens:
$args = \(1,2,3,:mice<blind>)
Values in a
Capture
object are parsed as ordinary expressions, marked as invocant, positional, named, and so on.Like
List
objects,Capture
objects are immutable in the abstract, but evaluate their arguments lazily. Before everything inside aCapture
is fully evaluated (which happens at compile time when all the arguments are constants), the eventual value may well be unknown. All we know is that we have the promise to make the bits of it immutable as they become known.Capture
objects may contain multiple unresolved iterators such as feeds or slices. How these are resolved depends on what they are eventually bound to. Some bindings are sensitive to multiple dimensions while others are not.You may retrieve parts from a
Capture
object with a prefix sigil operator:$args = \3; # same as "$args = \(3)" $$args; # same as "$args as Scalar" or "Scalar($args)" @$args; # same as "$args as Array" or "Array($args)" %$args; # same as "$args as Hash" or "Hash($args)"
When cast into an array, you can access all the positional arguments; into a hash, all named arguments; into a scalar, its invocant.
All prefix sigil operators accept one positional argument, evaluated in scalar context as a rvalue. They can interpolate in strings if called with parentheses. The special syntax form
$()
translates into$( $/ )
to operate on the current match object; the same applies to@()
and%()
.Capture
objects fill the ecological niche of references in Perl 6. You can think of them as "fat" references, that is, references that can capture not only the current identity of a single object, but also the relative identities of several related objects. Conversely, you can think of Perl 5 references as a degenerate form ofCapture
when you want to refer only to a single item.A signature object (
Signature
) may be created with colon-prefixed parens:my ::MySig ::= :(Int, Num, Complex, Status :mice)
Expressions inside the signature are parsed as parameter declarations rather than ordinary expressions. See S06 for more details on the syntax for parameters.
Signature objects bound to
::t
variables may be used within another signature to apply additional type constraints. When applied to aCapture
argument of form\$x
, the signature allows you to specify the types of parameters that would otherwise be untyped::(Num Dog|Cat $numdog, MySig \$a ($i,$j,$k,$mousestatus))
Unlike in Perl 5, the notation
&foo
merely stands for thefoo
function as a Code object without calling it. You may call any Code object with parens after it (which may, of course, contain arguments):&foo($arg1, $arg2);
Whitespace is not allowed before the parens, but there is a corresponding
.()
operator, plus the "unspace" forms that allow you to insert optional whitespace and comments between the backslash and the dot:&foo\ .($arg1, $arg2); &foo\#[ embedded comment ].($arg1, $arg2);
With multiple dispatch,
&foo
may not be sufficient to uniquely name a specific function. In that case, the type may be refined by using a signature literal as a postfix operator:&foo:(Int,Num)
It still just returns a
Code
object. A call may also be partially applied by using the.assuming
method:&foo.assuming(1,2,3,:mice<blind>)
Slicing syntax is covered in S09. A multidimensional slice will be done with semicolons between individual slice sublists. Each such slice sublist is evaluated lazily.
To make a slice subscript return something other than values, append an appropriate adverb to the subscript.
@array = <A B>; @array[0,1,2]; # returns 'A', 'B', undef @array[0,1,2]:p; # returns 0 => 'A', 1 => 'B' @array[0,1,2]:kv; # returns 0, 'A', 1, 'B' @array[0,1,2]:k; # returns 0, 1 @array[0,1,2]:v; # returns 'A', 'B' %hash = (:a<A>, :b<B>); %hash<a b c>; # returns 'A', 'B', undef %hash<a b c>:p; # returns a => 'A', b => 'B' %hash<a b c>:kv; # returns 'a', 'A', 'b', 'B' %hash<a b c>:k; # returns 'a', 'b' %hash<a b c>:v; # returns 'A', 'B'
The adverbial forms all weed out non-existing entries.
In numeric context (i.e. when cast into
Int
orNum
), a Hash object becomes the number of pairs contained in the hash. In a boolean context, a Hash object is true if there are any pairs in the hash. In either case, any intrinsic iterator would be reset. (If hashes do carry an intrinsic iterator (as they do in Perl 5), there will be a.reset
method on the hash object to reset the iterator explicitly.)Sorting a list of pairs should sort on their keys by default, then on their values. Sorting a list of lists should sort on the first elements, then the second elements, etc. For more on
sort
see S29.Many of the special variables of Perl 5 are going away. Those that apply to some object such as a filehandle will instead be attributes of the appropriate object. Those that are truly global will have global alphabetic names, such as
$*PID
or@*ARGS
.Any remaining special variables will be lexically scoped. This includes
$_
and@_
, as well as the new$/
, which is the return value of the last regex match.$0
,$1
,$2
, etc., are aliases into the$/
object.The
$#foo
notation is dead. Use@foo.end
or@foo[*-1]
instead. (Or@foo.shape[$dimension]
for multidimensional arrays.)
Names
An identifier is composed of an alphabetic character followed by any sequence of alphanumeric characters. The definitions of alphabetic and numeric include appropriate Unicode characters. Underscore is always considered alphabetic.
A name is anything that is a legal part of a variable name (not counting the sigil). This includes
$foo # simple identifiers $Foo::Bar::baz # compound identifiers separated by :: $Foo::($bar)::baz # compound identifiers that perform interpolations $42 # numeric names $! # certain punctuational variables
When not used as a sigil, the semantic function of
::
within a name is to force the preceding portion of the name to be considered a package through which the subsequent portion of the name is to be located. If the preceding portion is null, it means the package is unspecified and must be searched for according to the nature of what follows. Generally this means that an initial::
following the main sigil is a no-op on names that are known at compile time, though::
can also be used to introduce an interpolation (see below). Also, in the absence of another sigil,::
can serve as its own sigil indicating intentional use of a not-yet-declared package name.Unlike in Perl 5, if a sigil is followed by comma, semicolon, colon, or any kind of bracket or whitespace (including Unicode brackets and whitespace), it will be taken to be a sigil without a name rather than a punctuational variable. This allows you to use sigils as coercion operators:
print $( foo() ) # foo called in item context print @@( foo() ) # foo called in slice context
The bare sigil is parsed as a list operator in rvalue context, so these mean the same thing:
print $ foo() # foo called in item context print @@ foo() # foo called in slice context
In declarative contexts bare sigils may be used as placeholders for anonymous variables:
my ($a, $, $c) = 1..3; print unless (state $)++;
Ordinary package-qualified names look like in Perl 5:
$Foo::Bar::baz # the $baz variable in package Foo::Bar
Sometimes it's clearer to keep the sigil with the variable name, so an alternate way to write this is:
Foo::Bar::<$baz>
This is resolved at compile time because the variable name is a constant.
The following pseudo-package names are reserved in the first position:
MY # Lexical variables declared in the current scope OUR # Package variables declared in the current package GLOBAL # Builtin variables and functions PROCESS # process-related globals OUTER # Lexical variables declared in the outer scope CALLER # Contextual variables in the immediate caller's scope CONTEXT # Contextual variables in any context's scope SUPER # Package variables declared in inherited classes COMPILING # Lexical variables in the scope being compiled
Other all-caps names are semi-reserved. We may add more of them in the future, so you can protect yourself from future collisions by using mixed case on your top-level packages. (We promise not to break any existing top-level CPAN package, of course. Except maybe ACME, and then only for coyotes.)
You may interpolate a string into a package or variable name using
::($expr)
where you'd ordinarily put a package or variable name. The string is allowed to contain additional instances of::
, which will be interpreted as package nesting. You may only interpolate entire names, since the construct starts with::
, and either ends immediately or is continued with another::
outside the parens. Most symbolic references are done with this notation:$foo = "Bar"; $foobar = "Foo::Bar"; $::($foo) # package-scoped $Bar $::("MY::$foo") # lexically-scoped $Bar $::("*::$foo") # global $Bar $::($foobar) # $Foo::Bar $::($foobar)::baz # $Foo::Bar::baz $::($foo)::Bar::baz # $Bar::Bar::baz $::($foobar)baz # ILLEGAL at compile time (no operator baz)
Note that unlike in Perl 5, initial
::
doesn't imply global. Package names are searched for from inner lexical scopes to outer, then from inner packages to outer. Variable names are searched for from inner lexical scopes to outer, but unlike package names are looked for in only the current package and the global package.The global namespace is the last place it looks in either case. You must use the
*
(orGLOBAL
) package on the front of the string argument to force the search to start in the global namespace.Use the
MY
pseudopackage to limit the lookup to the current lexical scope, andOUR
to limit the scopes to the current package scope.When "strict" is in effect (which is the default except for one-liners), non-qualified variables (such as
$x
and@y
) are only looked up from lexical scopes, but never from package scopes.To bind package variables into a lexical scope, simply say
our ($x, @y)
. To bind global variables into a lexical scope, predeclare them withuse
:use GLOBAL <$IN $OUT>;
Or just refer to them as
$*IN
and$*OUT
.To do direct lookup in a package's symbol table without scanning, treat the package name as a hash:
Foo::Bar::{'&baz'} # same as &Foo::Bar::baz GLOBAL::<$IN> # Same as $*IN Foo::<::Bar><::Baz> # same as Foo::Bar::Baz
The
::
before the subscript is required here, because theFoo::Bar{...}
syntax is reserved for defining an autovivifiable protoobject along with its initialization closure (see S12).Unlike
::()
symbolic references, this does not parse the argument for::
, nor does it initiate a namespace scan from that initial point. In addition, for constant subscripts, it is guaranteed to resolve the symbol at compile time.The null pseudo-package is reserved to mean the same search list as an ordinary name search. That is, the following are all identical in meaning:
$foo $::{'foo'} ::{'$foo'} $::<foo> ::<$foo>
That is, each of them scans lexical scopes outward, and then the current package scope (though the package scope is then disallowed when "strict" is in effect).
As a result of these rules, you can write any arbitrary variable name as either of:
$::{'!@#$#@'} ::{'$!@#$#@'}
You can also use the
::<>
form as long as there are no spaces in the name.The current lexical symbol table is now accessible through the pseudo-package
MY
. The current package symbol table is visible as pseudo-packageOUR
. TheOUTER
name refers to theMY
symbol table immediately surrounding the currentMY
, andOUTER::OUTER
is the one surrounding that one.our $foo = 41; say $::foo; # prints 41, :: is no-op { my $foo = 42; say MY::<$foo>; # prints "42" say $MY::foo; # same thing say $::foo; # same thing, :: is no-op here say OUR::<$foo>; # prints "41" say $OUR::foo; # same thing say OUTER::<$foo>; # prints "41" (our $foo is also lexical) say $OUTER::foo; # same thing }
You may not use any lexically scoped symbol table, either by name or by reference, to add symbols to a lexical scope that is done compiling. (We reserve the right to relax this if it turns out to be useful though.)
The
CALLER
package refers to the lexical scope of the (dynamically scoped) caller. The caller's lexical scope is allowed to hide any variable except$_
from you. In fact, that's the default, and a lexical variable must have the trait "is context
" to be visible viaCALLER
. ($_
,$!
and$/
are always contextual.) If the variable is not visible in the caller, it returns failure. Variables whose names are visible at the point of the call but that come from outside that lexical scope are controlled by the scope in which they were originally declared. Hence the visibility ofCALLER::<$+foo>
is determined where$+foo
is actually declared, not by the caller's scope. LikewiseCALLER::CALLER::<$x>
depends only on the declaration of$x
visible in your caller's caller.Any lexical declared with the
is context
trait is by default considered readonly outside the current lexical scope. You may add a trait argument of<rw>
to allow called routines to modify your value.$_
,$!
, and$/
arecontext<rw>
by default. In any event, your lexical scope can always access the variable as if it were an ordinarymy
; the restriction on writing applies only to called subroutines.The
CONTEXT
pseudo-package is just likeCALLER
except that it starts in the current dynamic scope and from there scans outward through all dynamic scopes until it finds a contextual variable of that name in that context's lexical scope. (Use of$+FOO
is equivalent to CONTEXT::<$FOO> or $CONTEXT::FOO.) If after scanning all the lexical scopes of each dynamic scope, there is no variable of that name, it looks in the*
package. If there is no variable in the*
package and the variable is a scalar, it then looks in%*ENV
for the identifier of the variable, that is, in the environment variables passed to program. If the value is not found there, it returns failure. UnlikeCALLER
,CONTEXT
will see a contextual variable that is declared in the current scope, however it will not be writeable viaCONTEXT
unless declared "is context<rw>
", even if the variable itself is modifiable in that scope. (If it is, you should just use the bare variable itself to modify it.) Note that$+_
will always see the$_
in the current scope, not the caller's scope. You may useCALLER::<$+foo>
to bypass a contextual definition of$foo
in your current context, such as to initialize it with the outer contextual value:my $foo is context = CALLER::<$+foo>;
The
CONTEXT
package is only for internal overriding of contextual information, modelled on how environmental variables work among processes. Despite the fact that theCONTEXT
package reflects the current process's environment variables, at least where those are not hidden by lower-level declarations, theCONTEXT
package should not be considered isomorphic to the current set of environment variables. Subprocesses are passed only the global%*ENV
values. They do not see any lexical variables or their values, unless you copy those values into%*ENV
to change what subprocesses see:temp %*ENV{LANG} = $+LANG; # may be modified by parent system "greet";
There is no longer any special package hash such as
%Foo::
. Just subscript the package object itself as a hash object, the key of which is the variable name, including any sigil. The package object can be derived from a type name by use of the::
postfix operator:MyType::<$foo> MyType.::.{'$foo'} # same thing with dots MyType\ .::\ .{'$foo'} # same thing with unspaces
(Directly subscripting the type with either square brackets or curlies is reserved for various generic type-theoretic operations. In most other matters type names and package names are interchangeable.)
Typeglobs are gone. Use binding (
:=
or::=
) to do aliasing. Individual variable objects are still accessible through the hash representing each symbol table, but you have to include the sigil in the variable name now:MyPackage::{'$foo'}
or the equivalentMyPackage::<$foo>
.Truly global variables live in the
*
package:$*UID
,%*ENV
. (The*
may be omitted if you import the name from theGLOBAL
package.)$*foo
is short for$*::foo
, suggesting that the variable is "wild carded" into every package.For an ordinary Perl program running by itself, the
GLOBAL
andPROCESS
namespaces are considered synonymous. However, in certain situations (such as shared hosting under a webserver), the actual process may contain multiple virtual processes, each running its own "main" code. In this case, theGLOBAL
namespace holds variables that properly belong to the individual virtual process, while thePROCESS
namespace holds variables that properly belong to the actual process as a whole. From the viewpoint of theGLOBAL
namespace there is little difference, since process variables that normally appear inGLOBAL
are automatically imported fromPROCESS
. However, the process as a whole may place restrictions on the mutability of process variables as seen by the individual subprocesses. Also, individual subprocesses may not create new process variables. If the process wishes to grant subprocesses the ability to communicate via thePROCESS
namespace, it must supply a writeable variable to all the subprocesses granted that privilege.When these namespaces are so distinguished, the
*
shortcut always refers toGLOBAL
. There is no twigil shortcut forPROCESS
.Standard input is
$*IN
, standard output is$*OUT
, and standard error is$*ERR
. The magic command-line input handle is$*ARGS
. The arguments themselves come in@*ARGS
. See also "Declaring a MAIN subroutine" in S06.Magical file-scoped values live in variables with a
=
secondary sigil.$=DATA
is the name of yourDATA
filehandle, for instance. All pod structures are available through%=POD
(or some such). As with*
, the=
may also be used as a package name:$=::DATA
.Magical lexically scoped values live in variables with a
?
secondary sigil. These are all values that are known to the compiler, and may in fact be dynamically scoped within the compiler itself, and only appear to be lexically scoped because dynamic scopes of the compiler resolve to lexical scopes of the program. All$?
variables are considered constants, and may not be modified after being compiled in. The user is also allowed to define or (redefine) such constants:constant $?TABSTOP = 4; # assume heredoc tabs mean 4 spaces
(Note that the constant declarator always evaluates its initialization expression at compile time.)
$?FILE
and$?LINE
are your current file and line number, for instance.?
is not a shortcut for a package name like*
is. Instead of$?OUTER::SUB
you probably want to writeOUTER::<$?SUB>
. Within code that is being run during the compile, such as BEGIN blocks, or macro bodies, or constant initializers, the compiler variables must be referred to as (for instance)COMPILING::<$?LINE>
if the bare$?LINE
would be taken to be the value during the compilation of the currently running code rather than the eventual code of the user's compilation unit. For instance, within a macro body$?LINE
is the line within the macro body, butCOMPILING::<$?LINE>
is the line where the macro was invoked. See below for more about theCOMPILING
pseudo package.Here are some possibilities:
$?OS Which operating system am I compiled for? $?OSVER Which operating system version am I compiled for? $?PERLVER Which Perl version am I compiled for? $?FILE Which file am I in? $?LINE Which line am I at? $?PACKAGE Which package am I in? @?PACKAGE Which nested packages am I in? $?MODULE Which module am I in? @?MODULE Which nested modules am I in? $?CLASS Which class am I in? (as variable) @?CLASS Which nested classes am I in? $?ROLE Which role am I in? (as variable) @?ROLE Which nested roles am I in? $?GRAMMAR Which grammar am I in? @?GRAMMAR Which nested grammars am I in? $?PARSER Which Perl grammar was used to parse this statement? &?ROUTINE Which routine am I in? @?ROUTINE Which nested routines am I in? &?BLOCK Which block am I in? @?BLOCK Which nested blocks am I in? $?LABEL Which innermost block label am I in? @?LABEL Which nested block labels am I in?
All the nested
@?
variables are ordered from the innermost to the outermost, so@?BLOCK[0]
is always the same as&?BLOCK
.Note that some of these things have parallels in the
*
space at run time:$*OS Which OS I'm running under $*OSVER Which OS version I'm running under $*PERLVER Which Perl version I'm running under
You should not assume that these will have the same value as their compile-time cousins.
While
$?
variables are constant to the run time, the compiler has to have a way of changing these values at compile time without getting confused about its own$?
variables (which were frozen in when the compile-time code was itself compiled). The compiler can talk about these compiler-dynamic values using theCOMPILING
pseudopackage.References to
COMPILING
variables are automatically hoisted into the context currently being compiled. Setting or temporizing aCOMPILING
variable sets or temporizes the incipient$?
variable in the surrounding lexical context that is being compiled. If nothing in the context is being compiled, an exception is thrown.$?FOO // say "undefined"; # probably says undefined BEGIN { COMPILING::<$?FOO> = 42 } say $?FOO; # prints 42 { say $?FOO; # prints 42 BEGIN { temp COMPILING::<$?FOO> = 43 } # temporizes to *compiling* block say $?FOO; # prints 43 BEGIN { COMPILING::<$?FOO> = 44 } say $?FOO; # prints 44 BEGIN { say COMPILING::<$?FOO> } # prints 44, but $?FOO probably undefined } say $?FOO; # prints 42 (left scope of temp above) $?FOO = 45; # always an error COMPILING::<$?FOO> = 45; # an error unless we are compiling something
Note that
CALLER::<$?FOO>
might discover the same variable asCOMPILING::<$?FOO
>, but only if the compiling context is the immediate caller. LikewiseOUTER::<$?FOO>
might or might not get you to the right place. In the abstract,COMPILING::<$?FOO
> goes outwards dynamically until it finds a compiling scope, and so is guaranteed to find the "right"$?FOO
. (In practice, the compiler hopefully keeps track of its current compiling scope anyway, so no scan is needed.)Perceptive readers will note that this subsumes various "compiler hints" proposals. Crazy readers will wonder whether this means you could set an initial value for other lexicals in the compiling scope. The answer is yes. In fact, this mechanism is probably used by the exporter to bind names into the importer's namespace.
The currently compiling Perl parser is switched by modifying
COMPILING::<$?PARSER>
. Lexically scoped parser changes should temporize the modification. Changes from here to end-of-compilation unit can just assign or bind it. In general, most parser changes involve deriving a new grammar and then pointingCOMPILING::<$?PARSER>
at that new grammar. Alternately, the tables driving the current parser can be modified without derivation, but at least one level of anonymous derivation must intervene from the standard Perl grammar, or you might be messing up someone else's grammar. Basically, the current grammar has to belong only to the current compiling scope. It may not be shared, at least not without explicit consent of all parties. No magical syntax at a distance. Consent of the governed, and all that.
Literals
A single underscore is allowed only between any two digits in a literal number, where the definition of digit depends on the radix. Underscores are not allowed anywhere else in any numeric literal, including next to the radix point or exponentiator, or at the beginning or end.
Initial
0
no longer indicates octal numbers by itself. You must use an explicit radix marker for that. Pre-defined radix prefixes include:0b base 2, digits 0..1 0o base 8, digits 0..7 0d base 10, digits 0..9 0x base 16, digits 0..9,a..f (case insensitive)
The general radix form of a number involves prefixing with the radix in adverbial form:
:10<42> same as 0d42 or 42 :16<DEAD_BEEF> same as 0xDEADBEEF :8<177777> same as 0o177777 (65535) :2<1.1> same as 0b1.1 (0d1.5)
Extra digits are assumed to be represented by
a
..z
andA
..Z
, so you can go up to base 36. (UseA
andB
for base twelve, notT
andE
.) Alternately you can use a list of digits in decimal::60[12,34,56] # 12 * 3600 + 34 * 60 + 56 :100[3,'.',14,16] # pi
Any radix may include a fractional part. A dot is never ambiguous because you have to tell it where the number ends:
:16<dead_beef.face> # fraction :16<dead_beef>.face # method call
Only base 10 (in any form) allows an additional exponentiator starting with 'e' or 'E'. All other radixes must either rely on the constant folding properties of ordinary multiplication and exponentiation, or supply the equivalent two numbers as part of the string, which will be interpreted as they would outside the string, that is, as decimal numbers by default:
:16<dead_beef> * 16**8 :16<dead_beef*16**8>
It's true that only radixes that define
e
as a digit are ambiguous that way, but with any radix it's not clear whether the exponentiator should be 10 or the radix, and this makes it explicit:0b1.1e10 ILLEGAL, could be read as any of: :2<1.1> * 2 ** 10 1536 :2<1.1> * 10 ** 10 15,000,000,000 :2<1.1> * :2<10> ** :2<10> 6
So we write those as
:2<1.1*2**10> 1536 :2<1.1*10**10> 15,000,000,000 :2«1.1*:2<10>**:2<10>» 6
The generic string-to-number converter will recognize all of these forms (including the * form, since constant folding is not available to the run time). Also allowed in strings are leading plus or minus, and maybe a trailing Units type for an implied scaling. Leading and trailing whitespace is ignored. Note also that leading
0
by itself never implies octal in Perl 6.Any of the adverbial forms may be used as a function:
:2($x) # "bin2num" :8($x) # "oct2num" :10($x) # "dec2num" :16($x) # "hex2num"
Think of these as setting the default radix, not forcing it. Like Perl 5's old
oct()
function, any of these will recognize a number starting with a different radix marker and switch to the other radix. However, note that the:16()
converter function will interpret leading0b
or0d
as hex digits, not radix switchers.Characters indexed by hex numbers can be interpolated into strings by introducing with
"\x"
, followed by either a bare hex number ("\x263a"
) or a hex number in square brackets ("\x[263a]"
). Similarly,"\o12"
and"\o[12]"
interpolate octals, while"\d1234"
and"\d[1234]"
interpolate decimals--but generally you should be using hex in the world of Unicode. Multiple characters may be specified within any of the bracketed forms by separating the numbers with comma:"\x[41,42,43]"
. You must use the bracketed form to disambiguate if the unbracketed form would "eat" too many characters, because all of the unbracketed forms eat as many characters as they think look like digits in the radix specified. None of these notations work in normal Perl code. They work only in interpolations and regexes and the like.The old
\123
form is now illegal, as is the\0123
form. Only\0
remains, and then only if the next character is not in the range'0'..'7'
. Octal characters must use\o
notation. Note also that backreferences are no longer represented by\1
and the like--see S05.The
qw/foo bar/
quote operator now has a bracketed form:<foo bar>
. When used as a subscript it performs a slice equivalent to{'foo','bar'}
. Elsewhere it is equivalent to a parenthesisized list of strings:('foo','bar')
. Since parentheses are generally reserved just for precedence grouping, they merely autointerpolate in list context. Therefore@a = 1, < x y >, 2;
is equivalent to:
@a = 1, ('x', 'y'), 2;
which is the same as:
@a = 1, 'x', 'y', 2;
In scalar context, though, the implied parentheses are not removed, so
$a = < a b >;
is equivalent to:
$a = ('a', 'b');
which, because the list is assigned to a scalar, is autopromoted into an Array object:
$a = ['a', 'b'];
Likewise, if bound to a scalar parameter,
<a b>
will be treated as a single list object, but if bound to a slurpy parameter, it will auto-flatten.But note that under the parenthesis-rewrite rule, a single value will still act like a scalar value. These are all the same:
$a = < a >; $a = ('a'); $a = 'a';
And if bound to a scalar parameter, no list is constructed. To force a single value to become a list object in scalar context, you should use
['a']
for clarity as well as correctness.Much like the relationship between single quotes and double quotes, single angles do not interpolate while double angles do. The double angles may be written either with French quotes,
«$foo @bar[]»
, or with "Texas" quotes,<<$foo @bar[]>>
, as the ASCII workaround. The implicit split is done after interpolation, but respects quotes in a shell-like fashion, so that«'$foo' "@bar[]"»
is guaranteed to produce a list of two "words" equivalent to('$foo', "@bar[]")
.Pair
notation is also recognized inside«...»
and such "words" are returned asPair
objects.Colon pairs (but not arrow pairs) are recognized within double angles. In addition, the double angles allow for comments beginning with
#
. These comments work exactly like ordinary comments in Perl code. That is,#
at beginning of line is always a line-end comment, otherwise a following bracket sequence implies an inline comment; also, unlike in the shells, any literal#
must be quoted, even ones without whitespace in front of them, but note that this comes more or less for free with a colon pair like:char<#x263a>
.There is now a generalized adverbial form of Pair notation. The following table shows the correspondence to the "fatarrow" notation:
Fat arrow Adverbial pair Paren form ========= ============== ========== a => 1 :a a => 0 :!a a => 0 :a(0) a => $x :a($x) a => 'foo' :a<foo> :a(<foo>) a => <foo bar> :a<foo bar> :a(<foo bar>) a => «$foo @bar» :a«$foo @bar» :a(«$foo @bar») a => {...} :a{...} :a({...}) a => [...] :a[...] :a([...]) a => $a :$a a => @a :@a a => %a :%a a => $$a :$$a a => @$$a :@$$a (etc.) a => %foo<a> %foo<a>:p '' => $x :($x) '' => <x> :<x> '' => ($x,$y) :($x,$y) '' => [$x,$y] :[$x,$y] '' => {$x => $y} :{$x => $y}
The fatarrow construct may be used only where a term is expected because it's considered an expression in its own right, since the fatarrow is parsed as a normal infix operator (even when autoquoting an identifier on its left). The adverbial forms are considered special tokens and are recognized in various positions in addition to term position. In particular, when used where an infix would be expected they modify the previous operator, ignoring the intervening term or parenthesized argument. The form is also used to rename parameter declarations and to modify the meaning of various quoting forms. When appended to an identifier, the adverbial syntax is used to generate variants of that identifier; this syntax is used for naming operators such as
infix:<+>
and multiply dispatched grammatical rules such as statement_control:if. When so used the adverb is considered part of the name, soinfix:<+>
andinfix:<->
are two different operators. Likewiseprefix:<+>
is different frominfix:<+>
.Either fatarrow or adverbial pair notation may be used to pass named arguments as terms to a function or method. After a call with parenthesized arguments, only adverbial syntax may be used to pass additional arguments. This is typically used to pass an extra block:
find($directory) :{ when not /^\./ }
This actually falls out from the preceding rules because the adverbial block is in operator position, so it modifies the "find operator".
Note that as usual the
{...}
form can indicate either a closure or a hash depending on the contents. It does not indicate a subscript despite being parsed as one.Note also that the
<a b>
form is not a subscript and is therefore equivalent not to.{'a','b'}
but rather to('a','b')
. Bare<a>
turns into('a')
rather than('a',)
.Two or more adverbs can always be strung together without intervening punctuation anywhere a single adverb is acceptable. When used as named arguments in an argument list, you may put comma between, because they're just ordinary named arguments to the function, and a fatarrow pair would work the same. However, this comma is allowed only when the first pair occurs where a term is expected. Where an infix operator is expected, the adverb is always taken as modifying the nearest preceding operator that is not hidden within parentheses, and if you string together multiple such pairs, you may not put commas between, since that would cause subsequent pairs to look like terms. (The fatarrow form is not allowed at all in operator position.) See S06 for the use of adverbs as named arguments.
The negated form (
:!a
) and the sigiled forms (:$a
,:@a
,:%a
) never take an argument and don't care what the next character is. They are considered complete.The other forms of adverb (including the bare
:a
form) always look for an immediate bracketed argument, and will slurp it up. If that's not intended, you must use whitespace between the adverb and the opening bracket. The syntax of individual adverbs is the same everywhere in Perl 6. There are no exceptions based on whether an argument is wanted or not. (There is a minor exception for quote and regex adverbs, which accept only parentheses as their bracketing operator, and ignore other brackets, which must be placed in parens if desired. See "Paren form" in the table above.)Except as noted above, the parser always looks for the brackets. Despite not indicating a true subscript, the brackets are similarly parsed as postfix operators. As postfixes the brackets may be separated from their initial
:foo
with either unspace or dot (or both), but nothing else.Regardless of syntax, adverbs used as named arguments generally show up as optional named parameters to the function in question--even if the function is an operator or macro. The function in question neither knows nor cares how weird the original syntax was.
In addition to
q
andqq
, there is now the base formQ
which does no interpolation unless explicitly modified to do so. Soq
is really short forQ:q
andqq
is short forQ:qq
. In fact, all quote-like forms derive fromQ
with adverbs:q// Q :q // qq// Q :qq // rx// Q :regex // s/// Q :subst /// tr/// Q :trans ///
Adverbs such as
:regex
change the language to be parsed by switching to a different parser. This can completely change the interpretation of any subsequent adverbs as well as the quoted material itself.q:s// Q :q :scalar // rx:s// Q :regex :sigspace //
Generalized quotes may now take adverbs:
Short Long Meaning ===== ==== ======= :x :exec Execute as command and return results :w :words Split result on words (no quote protection) :ww :quotewords Split result on words (with quote protection) :q :single Interpolate \\, \q and \' (or whatever) :qq :double Interpolate with :s, :a, :h, :f, :c, :b :s :scalar Interpolate $ vars :a :array Interpolate @ vars :h :hash Interpolate % vars :f :function Interpolate & calls :c :closure Interpolate {...} expressions :b :backslash Interpolate \n, \t, etc. (implies :q at least) :to :heredoc Parse result as heredoc terminator :regex Parse as regex :subst Parse as substitution :trans Parse as transliteration :code Quasiquoting
You may omit the first colon by joining an initial
Q
,q
, orqq
with a single short form adverb, which produces forms like:qw /a b c/; # P5-esque qw// meaning q:w Qc '...{$x}...'; # Q:c//, interpolate only closures qqx/$cmd @args/ # equivalent to P5's qx//
(Note that
qx//
doesn't interpolate.)If you want to abbreviate further, just define a macro:
macro qx { 'qq:x ' } # equivalent to P5's qx// macro qTO { 'qq:x:w:to ' } # qq:x:w:to// macro circumfix:<❰ ❱> ($expr) { q:code{ $expr.quoteharder } }
All the uppercase adverbs are reserved for user-defined quotes. All Unicode delimiters above Latin-1 are reserved for user-defined quotes.
A consequence of the previous item is that we can now say:
%hash = qw:c/a b c d {@array} {%hash}/;
or
%hash = qq:w/a b c d {@array} {%hash}/;
to interpolate items into a
qw
. Conveniently, arrays and hashes interpolate with only whitespace separators by default, so the subsequent split on whitespace still works out. (But the built-in«...»
quoter automatically does interpolation equivalent toqq:ww/.../
. The built-in<...>
is equivalent toq:w/.../
.)Whitespace is allowed between the "q" and its adverb:
q :w /.../
.For these "q" forms the choice of delimiters has no influence on the semantics. That is,
''
,""
,<>
,«»
,``
,()
,[]
, and{}
have no special significance when used in place of//
as delimiters. There may be whitespace before the opening delimiter. (Which is mandatory for parens becauseq()
is a subroutine call andq:w(0)
is an adverb with arguments). Other brackets may also require whitespace when they would be understood as an argument to an adverb in something likeq:z<foo>//
. A colon may never be used as the delimiter since it will always be taken to mean another adverb regardless of what's in front of it. Nor may a#
character be used as the delimiter since it is always taken as whitespace (specifically, as a comment).New quoting constructs may be declared as macros:
macro quote:<qX> (*%adverbs) {...}
Note: macro adverbs are automatically evaluated at macro call time if the adverbs are included in the parse. If an adverb needs to affect the parsing of the quoted text of the macro, then an explicit named parameter may be passed on as a parameter to the
is parsed
subrule, or used to select which subrule to invoke.You may interpolate double-quotish text into a single-quoted string using the
\qq[...]
construct. Other "q" forms also work, including user-defined ones, as long as they start with "q". Otherwise you'll just have to embed your construct inside a\qq[...]
.Bare scalar variables always interpolate in double-quotish strings. Bare array, hash, and subroutine variables may never be interpolated. However, any scalar, array, hash or subroutine variable may start an interpolation if it is followed by a sequence of one or more bracketed dereferencers: that is, any of:
- 1. An array subscript
- 2. A hash subscript
- 3. A set of parentheses indicating a function call
- 4. Any of 1 through 3 in their dot form
- 5. A method call that includes argument parentheses
- 6. A sequence of one or more unparenthesized method call, followed by any of 1 through 5
In other words, this is legal:
"Val = $a.ord.fmt('%x')\n"
and is equivalent to
"Val = { $a.ord.fmt('%x') }\n"
In order to interpolate an entire array, it's necessary now to subscript with empty brackets:
print "The answers are @foo[]\n"
Note that this fixes the spurious "
@
" problem in double-quoted email addresses.As with Perl 5 array interpolation, the elements are separated by a space. (Except that a space is not added if the element already ends in some kind of whitespace. In particular, a list of pairs will interpolate with a tab between the key and value, and a newline after the pair.)
In order to interpolate an entire hash, it's necessary to subscript with empty braces or angles:
print "The associations are:\n%bar{}" print "The associations are:\n%bar<>"
Note that this avoids the spurious "
%
" problem in double-quoted printf formats.By default, keys and values are separated by tab characters, and pairs are terminated by newlines. (This is almost never what you want, but if you want something polished, you can be more specific.)
In order to interpolate the result of a sub call, it's necessary to include both the sigil and parentheses:
print "The results are &baz().\n"
The function is called in scalar context. (If it returns a list anyway, that list is interpolated as if it were an array in string context.)
In order to interpolate the result of a method call without arguments, it's necessary to include parentheses or extend the call with something ending in brackets:
print "The attribute is $obj.attr().\n" print "The attribute is $obj.attr<Jan>.\n"
The method is called in scalar context. (If it returns a list, that list is interpolated as if it were an array.)
It is allowed to have a cascade of argumentless methods as long as the last one ends with parens:
print "The attribute is %obj.keys.sort.reverse().\n"
(The cascade is basically counted as a single method call for the end-bracket rule.)
Multiple dereferencers may be stacked as long as each one ends in some kind of bracket:
print "The attribute is @baz[3](1,2,3){$xyz}<blurfl>.attr().\n"
Note that the final period above is not taken as part of the expression since it doesn't introduce a bracketed dereferencer.
A bare closure also interpolates in double-quotish context. It may not be followed by any dereferencers, since you can always put them inside the closure. The expression inside is evaluated in scalar (string) context. You can force list context on the expression using the
list
operator if necessary.The following means the same as the previous example.
print "The attribute is { @baz[3](1,2,3){$xyz}<blurfl>.attr }.\n"
The final parens are unnecessary since we're providing "real" code in the curlies. If you need to have double quotes that don't interpolate curlies, you can explicitly remove the capability:
qq:c(0) "Here are { $two uninterpolated } curlies";
or equivalently:
qq:!c "Here are { $two uninterpolated } curlies";
Alternately, you can build up capabilities from single quote to tell it exactly what you do want to interpolate:
q:s 'Here are { $two uninterpolated } curlies';
Secondary sigils (twigils) have no influence over whether the primary sigil interpolates. That is, if
$a
interpolates, so do$^a
,$*a
,$=a
,$?a
,$.a
, etc. It only depends on the$
.No other expressions interpolate. Use curlies.
A class method may not be directly interpolated. Use curlies:
print "The dog bark is {Dog.bark}.\n"
The old disambiguation syntax:
${foo[$bar]} ${foo}[$bar]
is dead. Use closure curlies instead:
{$foo[$bar]} {$foo}[$bar]
(You may be detecting a trend here...)
To interpolate a topical method, use curlies:
"{.bark}"
.To interpolate a function call without a sigil, use curlies:
"{abs $var}"
.And so on.
Backslash sequences still interpolate, but there's no longer any
\v
to mean vertical tab, whatever that is... (\v
now matches vertical whitespace in a regex.) Literal character representations are:\a BELL \b BACKSPACE \t TAB \n LINE FEED \f FORM FEED \r CARRIAGE RETURN \e ESCAPE
There's also no longer any
\L
,\U
,\l
,\u
, or\Q
. Use curlies with the appropriate function instead:"{ucfirst $word}"
.You may interpolate any Unicode codepoint by name using
\c
and square brackets:"\c[NEGATED DOUBLE VERTICAL BAR DOUBLE RIGHT TURNSTILE]"
Multiple codepoints constituting a single character may be interpolated with a single
\c
by separating the names with comma:"\c[LATIN CAPITAL LETTER A, COMBINING RING ABOVE]"
Whether that is regarded as one character or two depends on the Unicode support level of the current lexical scope. It is also possible to interpolate multiple codepoints that do not resolve to a single character:
"\c[LATIN CAPITAL LETTER A, LATIN CAPITAL LETTER B]"
[Note: none of the official Unicode character names contains comma.]
(Within a regex you may also use
\C
to match a character that is not the specified character.)If the character following
\c
or\C
is not a left square bracket, the single following character is turned into a control character by the usual trick of XORing the 64 bit. This allows\c@
for NULL and\c?
for DELETE, but note that the ESCAPE character may not be represented that way; it must be represented something like:\e \c[ESCAPE] \x1B \o33
Obviously
\e
is preferred when brevity is needed.There are no barewords in Perl 6. An undeclared bare identifier will always be taken to mean a subroutine name. (Class names (and other type names) are predeclared, or prefixed with the
::
type sigil when you're declaring a new one.) A consequence of this is that there's no longer any "use strict 'subs'
". Since the syntax for method calls is distinguished from sub calls, it is only unrecognized sub calls that must be treated specially.You still must declare your subroutines, but a bareword with an unrecognized name is provisionally compiled as a subroutine call, on that assumption that such a declaration will occur by the end of the current compilation unit:
foo; # provisional call if neither &foo nor ::foo is defined so far foo(); # provisional call if &foo is not defined so far foo($x); # provisional call if &foo is not defined so far foo($x, $y); # provisional call if &foo is not defined so far $x.foo; # not a provisional call; it's a method call on $x foo $x:; # not a provisional call; it's a method call on $x foo $x: $y; # not a provisional call; it's a method call on $x
If a postdeclaration is not seen, the compile fails at
CHECK
time. (You are still free to predeclare subroutines explicitly, of course.) The postdeclaration may be in any lexical or package scope that could have made the declaration visible to the provisional call had the declaration occurred before rather than after the provisional call.This fixup is done only for provisional calls. If there is any real predeclaration visible, it always takes precedence. In case of multiple ambiguous postdeclarations, either they must all be multis, or a compile-time error is declared and you must predeclare, even if one postdeclaration is obviously "closer". A single
proto
predeclaration may make all postdeclaredmulti
work fine, since that's a run-time dispatch, and all multis are effectively visible at the point of the controllingproto
declaration.Parsing of a bareword function as a provisional call is always done the same way list operators are treated. If a postdeclaration bends the syntax to be inconsistent with that, it is an error of the inconsistent signature variety.
If the unrecognized subroutine name is followed by
postcircumfix:<( )>
, it is compiled as a provisional function call of the parenthesized form. If it is not, it is compiled as a provisional function call of the list operator form, which may or may not have an argument list. When in doubt, the attempt is made to parse an argument list. As with any list operator, an immediate postfix operator is illegal unless it is a form of parentheses, whereas anything following whitespace will be interpreted as an argument list if possible.Based on the signature of the subroutine declaration, there are only four ways that an argument list can be parsed:
Signature # of expected args () 0 ($x) 1 ($x?) 0..1 (anything else) 0..Inf
That is, a standard subroutine call may be parsed only as a 0-arg term (or function call), a 1-mandatory-arg prefix operator (or function call), a 1-optional-arg term or prefix operator (or function call), or an "infinite-arg" list operator (or function call). A given signature might only accept 2 arguments, but the only number distinctions the parser is allowed to make is between void, singular and plural; checking that number of arguments supplied matches some number larger than one must be done as a separate semantic constraint, not as a syntactic constraint. Perl functions never take N arguments off of a list and leave the rest for someone else, except for small values of N, where small is defined as not more than 1. You can get fancier using macros, but macros always require predeclaration. Since the non-infinite-list forms are essentially behaving as macros, those forms also require predeclaration. Only the infinite-list form may be postdeclared (and hence used provisionally).
It is illegal for a provisional subroutine call to be followed by a colon postfix, since such a colon is allowed only on an indirect object, or a method call in dot form. (It is also allowed on a label when a statement is expected.) So for any undeclared identifier "
foo
":foo.bar # ILLEGAL -- postfix must use foo().bar foo .bar # foo($_.bar) -- no postfix starts with whitespace foo\ .bar # ILLEGAL -- must use foo()\ .bar foo++ # ILLEGAL -- postfix must use foo()++ foo 1,2,3 # foo(1,2,3) -- args always expected after listop foo + 1 # foo(+1) -- term always expected after listop foo; # foo(); -- no postfix, but no args either foo: # label -- must be label at statement boundary. -- ILLEGAL otherwise foo: bar: # two labels in a row, okay .foo: # $_.foo: 1 -- must be "dot" method with : args .foo(1) # $_.foo(1) -- must be "dot" method with () args .foo # $_.foo() -- must be "dot" method with no args .$foo: # $_.$foo: 1 -- indirect "dot" method with : args foo bar: 1 # bar.foo(1) -- bar must be predecl as class -- sub bar allowed here only if 0-ary -- otherwise you must say (bar): foo bar 1 # foo(bar(1)) -- both subject to postdeclaration -- never taken as indirect object foo $bar: 1 # $bar.foo(1) -- indirect object even if declared sub -- $bar considered one token foo (bar()): # bar().foo(1) -- even if foo declared sub foo bar(): # ILLEGAL -- bar() is two tokens. foo .bar: # foo(.bar:) -- colon chooses .bar to listopify foo bar baz: 1 # foo(baz.bar(1)) -- colon controls "bar", not foo. foo (bar baz): 1 # bar(baz()).foo(1) -- colon controls "foo" $foo $bar # ILLEGAL -- two terms in a row $foo $bar: # ILLEGAL -- use $bar.$foo for indirection (foo bar) baz: 1 # ILLEGAL -- use $baz.$(foo bar) for indirection
The indirect object colon only ever dominates a simple term, where "simple" includes classes and variables and parenthesized expressions, but explicitly not method calls, because the colon will bind to a trailing method call in preference. An indirect object that parses as more than one token must be placed in parentheses, followed by the colon.
In short, only an identifier followed by a simple term followed by a postfix colon is
ever
parsed as an indirect object, but that form willalways
be parsed as an indirect object regardless of whether the identifier is otherwise declared.There's also no "
use strict 'refs'
" because symbolic dereferences are now syntactically distinguished from hard dereferences.@($arrayref)
must now provide an actual array object, while@::($string)
is explicitly a symbolic reference. (Yes, this may give fits to the P5-to-P6 translator, but I think it's worth it to separate the concepts. Perhaps the symbolic ref form will admit real objects in a pinch.)There is no hash subscript autoquoting in Perl 6. Use
%x<foo>
for constant hash subscripts, or the old standby%x{'foo'}
. (It also works to say%x«foo»
as long as you realized it's subject to interpolation.)But
=>
still autoquotes any bare identifier to its immediate left (horizontal whitespace allowed but not comments). The identifier is not subject to keyword or even macro interpretation. If you say$x = do { call_something(); if => 1; }
then
$x
ends up containing the pair("if" => 1)
. Always. (Unlike in Perl 5, where version numbers didn't autoquote.)You can also use the :key($value) form to quote the keys of option pairs. To align values of option pairs, you may use the "unspace" postfix forms:
:longkey\ .($value) :shortkey\ .<string> :fookey\ .{ $^a <=> $^b }
These will be interpreted as
:longkey($value) :shortkey<string> :fookey{ $^a <=> $^b }
The double-underscore forms are going away:
Old New --- --- __LINE__ $?LINE __FILE__ $?FILE __PACKAGE__ $?PACKAGE __END__ =begin END __DATA__ =begin DATA
The
=begin END
pod stream is special in that it assumes there's no corresponding=end END
before end of file. TheDATA
stream is no longer special--any POD stream in the current file can be accessed via a filehandle, named as%=POD{'DATA'}
and such. Alternately, you can treat a pod stream as a scalar via$=DATA
or as an array via@=DATA
. Presumably a module could read all its COMMENT blocks from@=COMMENT
, for instance. Each chunk of pod comes as a separate array element. You have to split it into lines yourself. Each chunk has a.range
property that indicates its line number range within the source file.The lexical routine itself is
&?ROUTINE
; you can get its name with&?ROUTINE.name
. The current block is&?BLOCK
. If the block has any labels, those shows up in&?BLOCK.labels
. Within the lexical scope of a statement with a label, the label is a pseudo-object representing the dynamic context of that statement. (If inside multiple dynamic instances of that statement, the label represents the innermost one.) When you say:next LINE;
it is really a method on this pseudo-object, and
LINE.next;
would work just as well. You can exit any labeled block early by saying
MyLabel.leave(@results);
Heredocs are no longer written with
<<
, but with an adverb on any other quote construct:print qq:to/END/; Give $amount to the man behind curtain number $curtain. END
Other adverbs are also allowed, as are multiple heredocs within the same expression:
print q:c:to/END/, q:to/END/; Give $100 to the man behind curtain number {$curtain}. END Here is a $non-interpolated string END
Heredocs allow optional whitespace both before and after terminating delimiter. Leading whitespace equivalent to the indentation of the delimiter will be removed from all preceding lines. If a line is deemed to have less whitespace than the terminator, only whitespace is removed, and a warning may be issued. (Hard tabs will be assumed to be
($?TABSTOP // 8)
spaces, but as long as tabs and spaces are used consistently that doesn't matter.) A null terminating delimiter terminates on the next line consisting only of whitespace, but such a terminator will be assumed to have no indentation. (That is, it's assumed to match at the beginning of any whitespace.)There are two possible ways to parse heredocs. One is to look ahead for the newline and grab the lines corresponding to the heredoc, and then parse the rest of the original line. This is how Perl 5 does it. Unfortunately this suffers from the problem pervasive in Perl 5 of multi-pass parsing, which is masked somewhat because there's no way to hide a newline in Perl 5. In Perl 6, however, we can use "unspace" to hide a newline, which means that an algorithm looking ahead to find the newline must do a full parse (with possible untoward side effects) in order to locate the newline.
Instead, Perl 6 takes the one-pass approach, and just lazily queues up the heredocs it finds in a line, and waits until it sees a "real" newline to look for the text and attach it to the appropriate heredoc. The downside of this approach is a slight restriction--you may not use the actual text of the heredoc in code that must run before the line finishes parsing. Mostly that just means you can't write:
BEGIN { say q:to/END/ } Say me! END
You must instead put the entire heredoc into the
BEGIN
:BEGIN { say q:to/END/; Say me! END }
A version literal is written with a 'v' followed by the version number in dotted form. This always constructs a
Version
object, not a string. Only integers and certain wildcards are allowed; for anything fancier you must coerce a string to aVersion
:v1.2.3 # okay v1.2.* # okay, wildcard version v1.2.3+ # okay, wildcard version v1.2.3beta # illegal Version('1.2.3beta') # okay
Note though that most places that take a version number in Perl accept it as a named argument, in which case saying
:ver<1.2.3beta>
is fine. See S11 for more on using versioned modules.Version objects have a predefined sort order that follows most people's intuition about versioning: each sorting position sorts numerically between numbers, alphabetically between alphas, and alphabetics in a position before numerics. Missing final positions are assumed to be '.0'. Except for '0' itself, numbers ignore leading zeros. For splitting into sort positions, if any alphabetics (including underscore) are immediately adjacent to a number, a dot is assumed between them. Likewise any non-alphanumeric character is assumed to be equivalent to a dot. So these are all equivalent:
1.2.1alpha1.0 1.2.1alpha1 1.2.1.alpha1 1.2.1alpha.1 1.2.1.alpha.1 1.2-1+alpha/1
And these are also equivalent:
1.2.1_01 1.2.1_1 1.2.1._1 1.2.1_1 1.2.1._.1 001.0002.0000000001._.00000000001 1.2.1._.1.0.0.0.0.0
So these are in sorted version order:
1.2.0.999 1.2.1_01 1.2.1_2 1.2.1_003 1.2.1a1 1.2.1.alpha1 1.2.1b1 1.2.1.beta1 1.2.1.gamma 1.2.1α1 1.2.1β1 1.2.1γ 1.2.1
Note how the last pair assume that an implicit .0 sorts after anything alphabetic, and that alphabetic is defined according to Unicode, not just according to ASCII. The intent of all this is to make sure that prereleases sort before releases. Note also that this is still a subset of the versioning schemes seen in the real world. Modules with such strange versions can still be used by Perl since by default Perl imports external modules by exact version number. (See S11.) Only range operations will be compromised by an unknown foreign collation order, such as a system that sorts "delta" after "gamma".
Context
Perl still has the three main contexts: void, scalar, and list.
In addition to undifferentiated scalars, we also have these scalar contexts:
Context Type OOtype Operator ------- ---- ------ -------- boolean bit Bit ? integer int Int int numeric num Num + string buf Str ~
There are also various container contexts that require particular kinds of containers.
Unlike in Perl 5, objects are no longer always considered true. It depends on the state of their
.true
property. Classes get to decide which of their values are true and which are false. Individual objects can override the class definition:return 0 but True;
Lists
List context in Perl 6 is by default lazy. This means a list can contain infinite generators without blowing up. No flattening happens to a lazy list until it is bound to the signature of a function or method at call time (and maybe not even then). We say that such an argument list is "lazily flattened", meaning that we promise to flatten the list on demand, but not before.
There is a "
list
" operator which imposes a list context on its arguments even iflist
itself occurs in a scalar context. In list context, it flattens lazily. In a scalar context, it returns the resulting list as a singleList
object. (So thelist
operator really does exactly the same thing as putting a list in parentheses with at least one comma. But it's more readable in some situations.)To force a non-flattening scalar context, use the "
item
" operator.When evaluating chained operators, if an
each()
occurs anywhere in that chain, the chain will be transformed first into agrep
. That is,for 0 <= each(@x) < all(@y) {...}
becomes
for @x.grep:{ 0 <= $_ < all(@y) } {...}
Because of this, the original ordering
@x
is guaranteed to be preserved in the returned list, and duplicate elements in@x
are preserved as well. In particular,@result = each(@x) ~~ {...};
is equivalent to
@result = @x.grep:{...};
However, this each() comprehension is strictly a syntactic transformation, so a list computed any other way will not trigger the rewrite:
@result = (@x = each(@y)) ~~ {...}; # not a comprehension
The
|
prefix operator may be used to force "capture" context on its argument and also defeat any scalar argument checking imposed by subroutine signature declarations. Any resulting list arguments are then evaluated lazily.To force non-lazy list flattening, use the
eager
list operator. Don't use it on an infinite generator unless you have a machine with infinite memory, and are willing to wait a long time. It may also be applied to a scalar iterator to force immediate iteration to completion.A variant of
eager
is thehyper
list operator, which declares not only that you want all the values generated now, but that you want them badly enough that you don't care what order they're generated in. That is,eager
requires sequential evaluation of the list, whilehyper
requests (but does not require) parallel evaluation. In any case, it declares that you don't care about the evaluation order. (Conjecture: populating a hash from a hyper list of pairs could be done as the results come in, such that some keys can be seen even before the hyper is done. Thinking about Map-Reduce algorithms here...)Signatures on non-multi subs can be checked at compile time, whereas multi sub and method call signatures can only be checked at run time (in the absence of special instructions to the optimizer).
This is not a problem for arguments that are arrays or hashes, since they don't have to care about their context, but just return themselves in any event, which may or may not be lazily flattened.
However, function calls in the argument list can't know their eventual context because the method hasn't been dispatched yet, so we don't know which signature to check against. As in Perl 5, list context is assumed unless you explicitly qualify the argument with a scalar context operator.
The
=>
operator now constructsPair
objects rather than merely functioning as a comma. Both sides are in scalar context.The
..
operator now constructsRange
objects rather than merely functioning as an operator. Both sides are in scalar context.There is no such thing as a hash list context. Assignment to a hash produces an ordinary list context. You may assign alternating keys and values just as in Perl 5. You may also assign lists of
Pair
objects, in which case each pair provides a key and a value. You may, in fact, mix the two forms, as long as the pairs come when a key is expected. If you wish to supply aPair
as a key, you must compose an outerPair
in which the key is the innerPair
:%hash = (($keykey => $keyval) => $value);
The anonymous
enum
function takes a list of keys or pairs, and adds values to any keys that are not already part of a key. The value added is one more than the previous key or pair's value. This works nicely with the newqq:ww
form:%hash = enum <<:Mon(1) Tue Wed Thu Fri Sat Sun>>; %hash = enum « :Mon(1) Tue Wed Thu Fri Sat Sun »;
are the same as:
%hash = (); %hash<Mon Tue Wed Thu Fri Sat Sun> = 1..7;
In contrast to assignment, binding to a hash requires a
Hash
(orPair
) object. Binding to a "splat" hash requires a list of pairs or hashes, and stops processing the argument list when it runs out of pairs or hashes. See S06 for much more about parameter binding.
Files
Filename globs are no longer done with angle brackets. Use the
glob
function.Input from a filehandle is no longer done with angle brackets. Instead of
while (<HANDLE>) {...}
you now write
for =$handle {...}
As a unary prefix operator, you may also apply adverbs to
=
:for =$handle :prompt('$ ') { say $_ + 1 }
or
for =($handle):prompt('$ ') { say $_ + 1 }
or you may even write it in its functional form, passing the adverbs as ordinary named arguments.
for prefix:<=>($handle, :prompt('$ ')) { say $_ + 1 }
Properties
Properties work as detailed in S12. They're actually object attributes provided by role mixins. Compile-time properties applied to containers and such still use the
is
keyword, but are now called "traits". On the other hand, run-time properties are attached to individual objects using thebut
keyword instead, but are still called "properties".Properties are accessed just like attributes because they are in fact attributes of some class or other, even if it's an anonymous singleton class generated on the fly for that purpose. Since "
rw
" attributes behave in all respects as variables, properties may therefore also be temporized withtemp
, or hypotheticalized withlet
.
Grammatical Categories
Lexing in Perl 6 is controlled by a system of grammatical categories. At each point in the parse, the lexer knows which subset of the grammatical categories are possible at that point, and follows the longest-token rule across all the active grammatical categories. The grammatical categories that are active at any point are specified using a regex construct involving a set of magical hashes. For example, the matcher for the beginning of a statement might look like:
<%statement_control
| %scope_declarator
| %prefix
| %prefix_circumfix_meta_operator
| %circumfix
| %quote
| %term
>
(Ordering of grammatical categories within such a construct matters only in case of a "tie", in which case the grammatical category that is notionally "first" wins. For instance, given the example above, a statement_control is always going to win out over a prefix operator of the same name. And the reason you can't call a function named "if" directly as a list operator is because it would be hidden either by the statement_control category at the beginning of a statement or by the statement_modifier category elsewhere in the statement. Only the if(...)
form unambiguously calls an "if" function, and even that works only because statement controls and statement modifiers require subsequent whitespace, as do list operators.)
Here are the current grammatical categories:
category:<prefix> prefix:<+>
circumfix:<[ ]> [ @x ]
dotty:<.=> $obj.=method
infix_circumfix_meta_operator:{'»','«'} @a »+« @b
infix_postfix_meta_operator:<=> $x += 2;
infix_prefix_meta_operator:<!> $x !~~ 2;
infix:<+> $x + $y
package_declarator:<role> role Foo;
postcircumfix:<[ ]> $x[$y] or $x.[$y]
postfix_prefix_meta_operator:{'»'} @array »++
postfix:<++> $x++
prefix_circumfix_meta_operator:{'[',']'} [*]
prefix_postfix_meta_operator:{'«'} -« @magnitudes
prefix:<!> !$x (and $x.'!')
q_backslash:<\\> '\\'
qq_backslash:<n> "\n"
quote_mod:<x> q:x/ ls /
quote:<qq> qq/foo/
regex_assertion:<!> /<!before \h>/
regex_backslash:<w> /\w/ and /\W/
regex_metachar:<.> /.*/
regex_mod_internal:<P5> m:/ ... :P5 ... /
routine_declarator:<sub> sub foo {...}
scope_declarator:<has> has $.x;
sigil:<%> %hash
special_variable:<$!> $!
statement_control:<if> if $condition { 1 } else { 2 }
statement_mod_cond:<if> .say if $condition
statement_mod_loop:<for> .say for 1..10
statement_prefix:<gather> gather for @foo { .take }
term:<!!!> $x = { !!! }
trait_auxiliary:<does> my $x does Freezable
trait_verb:<handles> has $.tail handles <wag>
twigil:<?> $?LINE
type_declarator:<subset> subset Nybble of Int where ^16
version:<v> v4.3.*
Any category containing "circumfix" requires two token arguments, supplied in slice notation. Note that many of these names do not represent real operators, and you wouldn't be able to call them even though you can name them.