=encoding utf8

=head1 TITLE

Synopsis 2: Bits and Pieces

=head1 AUTHORS

    Larry Wall <larry@wall.org>

=head1 VERSION

    Created: 10 Aug 2004

    Last Modified: 19 Nov 2010
    Version: 230

This document summarizes Apocalypse 2, which covers small-scale
lexical items and typological issues.  (These Synopses also contain
updates to reflect the evolving design of Perl 6 over time, unlike the
Apocalypses, which are frozen in time as "historical documents".
These updates are not marked--if a Synopsis disagrees with its
Apocalypse, assume the Synopsis is correct.)

=head1 One-pass parsing

To the extent allowed by sublanguages' parsers, Perl is parsed using a
one-pass, predictive parser.  That is, lookahead of more than one
"longest token" is discouraged.  The currently known exceptions to
this are where the parser must:

=over 4

=item *

Locate the end of interpolated expressions that begin with a sigil
and might or might not end with brackets.

=item *

Recognize that a reduce operator is not really beginning a C<[...]> composer.


=head1 Lexical Conventions

=over 4

=item *

In the abstract, Perl is written in Unicode, and has consistent Unicode
semantics regardless of the underlying text representations.  By default
Perl presents Unicode in "NFG" formation, where each grapheme counts as
one character.  A grapheme is what the novice user would think of as a
character in their normal everyday life, including any diacritics.

=item *

Perl can count Unicode line and paragraph separators as line markers,
but that behavior had better be configurable so that Perl's idea of
line numbers matches what your editor thinks about Unicode lines.

=item *

Unicode horizontal whitespace is counted as whitespace, but it's better
not to use thin spaces where they will make adjoining tokens look like
a single token.  On the other hand, Perl doesn't use indentation as syntax,
so you are free to use any amount of whitespace anywhere that whitespace
makes sense. Comments always count as whitespace.

=item *

For some syntactic purposes, Perl distinguishes bracketing characters
from non-bracketing.  Bracketing characters are defined as any Unicode
characters with either bidirectional mirrorings or Ps/Pe/Pi/Pf properties.

In practice, though, you're safest using matching characters with
Ps/Pe/Pi/Pf properties, though ASCII angle brackets are a notable exception,
since they're bidirectional but not in the Ps/Pe/Pi/Pf sets.

Characters with no corresponding closing character do not qualify
as opening brackets.  This includes the second section of the Unicode
BidiMirroring data table.

If a character is already used in Ps/Pe/Pi/Pf mappings, then any entry
in BidiMirroring is ignored (both forward and backward mappings).
For any given Ps character, the next Pe codepoint (in numerical
order) is assumed to be its matching character even if that is not
what you might guess using left-right symmetry.  Therefore C<U+298D>
maps to C<U+298E>, not C<U+2990>, and C<U+298F> maps to C<U+2990>,
not C<U+298E>.  Neither C<U+298E> nor C<U+2990> are valid bracket
openers, despite having reverse mappings in the BidiMirroring table.

The C<U+301D> codepoint has two closing alternatives, C<U+301E> and C<U+301F>;
Perl 6 only recognizes the one with lower code point number, C<U+301E>,
as the closing brace.  This policy also applies to new one-to-many
mappings introduced in the future.

However, many-to-one mappings are fine; multiple opening characters
may map to the same closing character.  For instance, C<U+2018>, C<U+201A>,
and C<U+201B> may all be used as the opener for the C<U+2019> closer.
Constructs that count openers and closers assume that only the given
opener is special.  That is, if you open with one of the alternatives,
all other alternatives are treated as non-bracketing characters within
that construct.


=head1 Whitespace and Comments

=over 4

=item *

Pod sections may be used reliably as multiline comments in Perl 6.
Unlike in Perl 5, Pod syntax now lets you use C<=begin comment>
and C<=end comment> delimit a Pod block correctly without the need
for C<=cut>.  (In fact, C<=cut> is now gone.)  The format name does
not have to be C<comment> -- any unrecognized format name will do
to make it a comment.  (However, bare C<=begin> and C<=end> probably
aren't good enough, because all comments in them will show up in the
formatted output.)

We have single paragraph comments with C<=for comment> as well.
That lets C<=for> keep its meaning as the equivalent of a C<=begin>
and C<=end> combined.  As with C<=begin> and C<=end>, a comment started
in code reverts to code afterwards.

Since there is a newline before the first C<=>, the Pod form of comment
counts as whitespace equivalent to a newline.  See S26 for more on
embedded documentation.

=item *

Except within a quote literal, a C<#> character always introduces a comment in
Perl 6.  There are two forms of comment based on C<#>.  Embedded
comments require the C<#> to be followed by a backtick (C<`>) plus one
or more opening bracketing characters.

All other uses of C<#> are interpreted as single-line comments that
work just as in Perl 5, starting with a C<#> character and
ending at the subsequent newline.  They count as whitespace equivalent
to newline for purposes of separation.  Unlike in Perl 5, C<#>
may I<not> be used as the delimiter in quoting constructs.

=item *

Embedded comments are supported as a variant on quoting syntax, introduced
by C<#`> plus any user-selected bracket characters (as defined in
L</Lexical Conventions> above):

    say #`( embedded comment ) "hello, world!";

    $object\#`{ embedded comments }.say;

    $object\ #`「
        embedded comments

Brackets may be nested, following the same policy as ordinary quote brackets.

There must be no space between the C<#`> and the opening bracket character.
(There may be the I<visual appearance> of space for some double-wide
characters, however, such as the corner quotes above.)

For multiline comments it is recommended (but not required) to use two or
more brackets both for visual clarity and to avoid relying too much on
internal bracket counting heuristics when commenting code that may accidentally
miscount single brackets:

        say "here is an unmatched } character";

However, it's sometimes better to use Pod comments because they are
implicitly line-oriented.

=item *

For all quoting constructs that use user-selected brackets, you can open
with multiple identical bracket characters, which must be closed by the
same number of closing brackets.  Counting of nested brackets applies only
to pairs of brackets of the same length as the opening brackets:

    say #`{{
        This comment contains unmatched } and { { { {   (ignored)
        Plus a nested {{ ... }} pair                    (counted)
    }} q<< <<woot>> >>   # says " <<woot>> "

Note however that bare circumfix or postcircumfix C<<< <<...>> >>> is
not a user-selected bracket, but the ASCII variant of the C<< «...» >>
interpolating word list.  Only C<#`> and the C<q>-style quoters (including
C<m>, C<s>, C<tr>, and C<rx>) enable subsequent user-selected brackets.

=item *

Some languages such as C allow you to escape newline characters
to combine lines.  Other languages (such as regexes) allow you to
backslash a space character for various reasons.  Perl 6 generalizes
this notion to any kind of whitespace.  Any contiguous whitespace
(including comments) may be hidden from the parser by prefixing it
with C<\>.  This is known as the "unspace".  An unspace can suppress
any of several whitespace dependencies in Perl.  For example, since
Perl requires an absence of whitespace between a noun and a postfix
operator, using unspace lets you line up postfix operators:

    %hash\  {$key}
    @array\ [$ix]

As a special case to support the use above, a backslash where
a postfix is expected is considered a degenerate form of unspace.
Note that whitespace is not allowed before that, hence

    $subref \($arg)

is a syntax error (two terms in a row).  And

    foo \($arg)

will be parsed as a list operator with a C<Capture> argument:


However, other forms of unspace may usefully be preceded by whitespace.
(Unary uses of backslash may therefore never be followed by whitespace
or they would be taken as an unspace.)

Other postfix operators may also make use of unspace:

    $number\  ++;
    $number\  --;
    1+3\      i;
    $object\  .say();
    $object\#`{ your ad here }.say

Another normal use of a you-don't-see-this-space is typically to put
a dotted postfix on the next line:

    $object\ # comment

    $object\#`[ comment


But unspace is mainly about language extensibility: it lets you continue
the line in any situation where a newline might confuse the parser,
regardless of your currently installed parser.  (Unless, of course,
you override the unspace rule itself...)

Although we say that the unspace hides the whitespace from the parser,
it does not hide whitespace from the lexer.  As a result, unspace is not
allowed within a token.  Additionally, line numbers are still
counted if the unspace contains one or more newlines.
Since Pod chunks count as whitespace to the language, they are also
swallowed up by unspace.  Heredoc boundaries are suppressed, however,
so you can split excessively long heredoc intro lines like this:

    ok(q:to'CODE', q:to'OUTPUT', \
    "Here is a long description", \ # --more--
    todo(:parrøt<0.42>, :dötnet<1.2>));

To the heredoc parser that just looks like:

    ok(q:to'CODE', q:to'OUTPUT', "Here is a long description", todo(:parrøt<0.42>, :dötnet<1.2>));

Note that this is one of those cases in which it is fine to have
whitespace before the unspace, since we're only trying to suppress
the newline transition, not all whitespace as in the case of postfix
parsing.  (Note also that the example above is not meant to spec how
the test suite works. )

=item *

An unspace may contain a comment, but a comment may not contain an unspace.
In particular, end-of-line comments do not treat backslash as significant.
If you say:

    #`\ (...


    #\ `(...

it is an end-of-line comment, not an embedded comment.  Write:

    \ #`(

to mean the other thing.

=item *

In general, whitespace is optional in Perl 6 except where it is needed
to separate constructs that would be misconstrued as a single token or
other syntactic unit.  (In other words, Perl 6 follows the standard
I<longest-token> principle, or in the cases of large constructs, a
I<prefer shifting to reducing> principle.  See L</Grammatical Categories>
below for more on how a Perl program is analyzed into tokens.)

This is an unchanging deep rule, but the surface ramifications of it
change as various operators and macros are added to or removed from
the language, which we expect to happen because Perl 6 is designed to
be a mutable language.  In particular, there is a natural conflict
between postfix operators and infix operators, either of which
may occur after a term.  If a given token may be interpreted as
either a postfix operator or an infix operator, the infix operator
requires space before it.  Postfix operators may never have intervening
space, though they may have an intervening dot.  If further separation
is desired, an unspace or embedded comment may be used as described above, as long
as no whitespace occurs outside the unspace or embedded comment.

For instance, if you were to add your own C<< infix:<++> >> operator,
then it must have space before it. The normal autoincrementing
C<< postfix:<++> >> operator may never have space before it, but may
be written in any of these forms:




    $x\ ++

    $x\ .++

    $x\#`( comment ).++
    $x\#`((( comment ))).++


    $x\         # comment
                # inside unspace

    $x\         # comment
                # inside unspace
    ++          # (but without the optional postfix dot)

    $x\#`『      comment
                more comment

    $x\#`[   comment 1
    comment 2
    =begin Podstuff
    whatever (Pod comments ignore current parser state)
    =end Podstuff
    comment 3

A consequence of the postfix rule is that (except when delimiting a
quote or terminating an unspace) a dot with whitespace in front
of it is always considered a method call on C<$_> where a term is
expected.  If a term is not expected at this point, it is a syntax
error.  (Unless, of course, there is an infix operator of that name
beginning with dot.  You could, for instance, define a Fortranly
C<< infix:<.EQ.> >> if the fit took you.  But you'll have to be sure to
always put whitespace in front of it, or it would be interpreted as
a postfix method call instead.)

For example,

    foo .method



will always be interpreted as

    foo $_.method

but never as


Use some variant of


if you mean the postfix method call.

One consequence of all this is that you may no longer write a Num as
C<42.> with just a trailing dot.  You must instead say either C<42>
or C<42.0>.  In other words, a dot following a number can only be a
decimal point if the following character is a digit.  Otherwise the
postfix dot will be taken to be the start of some kind of method call
syntax.  (The C<.123> form with a leading
dot is still allowed however when a term is expected, and is equivalent
to C<0.123> rather than C<$_.123>.)


=head1 Built-In Data Types

=over 4

=item *

In support of OO encapsulation, there is a new fundamental datatype:
B<P6opaque>.  External access to opaque objects is always through method
calls, even for attributes.

=item *

Perl 6 has an optional type system that helps you write safer
code that performs better.  The compiler is free to infer what type
information it can from the types you supply, but will not complain
about missing type information unless you ask it to.

=item *

Types are officially compared using name equivalence rather than
structural equivalence.  However, we're rather liberal in what we
consider a name.  For example, the name includes the version and
authority associated with the module defining the type (even if
the type itself is "anonymous").  Beyond that, when you instantiate
a parametric type, the arguments are considered part of the "long
name" of the resulting type, so one C<Array of Int> is equivalent to
another C<Array of Int>.  (Another way to look at it is that the type
instantiation "factory" is memoized.)  Typename aliases are considered
equivalent to the original type.  In particular, the C<Array of Int> syntax
is just sugar for C<Array:of(Int)>, which is the canonical form of an
instantiated generic type.

This name equivalence of parametric types extends only to parameters
that can be considered immutable (or that at least can have an
immutable snapshot taken of them).  Two distinct classes are never
considered equivalent even if they have the same attributes because
classes are not considered immutable.

=item *

Perl 6 supports the notion of B<properties> on various kinds of
objects.  Properties are like object attributes, except that they're
managed by the individual object rather than by the object's class.

According to S12, properties are actually implemented by a
kind of mixin mechanism, and such mixins are accomplished by the
generation of an individual anonymous class for the object (unless
an identical anonymous class already exists and can safely be shared).

=item *

Properties applied to objects constructed at compile-time, such as
variables and classes, are also called B<traits>.  Traits cannot be
changed at run-time.  Changes to run-time properties are done via
mixin instead, so that the compiler can optimize based on declared traits.

=item *

Perl 6 is an OO engine, but you're not generally required to think
in OO when that's inconvenient.  However, some built-in concepts such
as filehandles will be more object-oriented in a user-visible way
than in Perl 5.

=item *

A variable's type is a constraint indicating what sorts of values the
variable may contain.  More precisely, it's a promise that the object
or objects contained in the variable are capable of responding to the
methods of the indicated "role".  See S12 for more about roles.

    # $x can contain only Int objects
    my Int $x;

A variable may itself be bound to a container type that specifies how
the container works, without specifying what kinds of things it contains.

    # $x is implemented by the MyScalar class
    my $x is MyScalar;

Constraints and container types can be used together:

    # $x can contain only Int objects,
    # and is implemented by the MyScalar class
    my Int $x is MyScalar;

Note that C<$x> is also initialized to the C<Int> type object.  See below for
more on this.

=item *

C<my Dog $spot> by itself does not automatically call a C<Dog> constructor.
It merely assigns an undefined C<Dog> prototype object to C<$spot>:

    my Dog $spot;           # $spot is initialized with ::Dog
    my Dog $spot = Dog;     # same thing

    $spot.defined;          # False
    say $spot;              # "Dog"

Any type name used as a value is an undefined instance of that type's 
prototype object, or I<type object> for short.  See S12 for more on that.

Any type name in rvalue context is parsed as a single type value and
expects no arguments following it.  However, a type object responds to the 
function call interface, so you may use the name of a type with parentheses
as if it were a function, and any argument supplied to the call is coerced
to the type indicated by the type object.  If there is no argument
in the parentheses, the type object returns itself:

    my $type = Num;             # type object as a value
    $num = $type($string)       # coerce to Num

To get a real C<Dog> object, call a constructor method such as C<new>:

    my Dog $spot .= new;
    my Dog $spot = $spot.new;   # .= is rewritten into this

You can pass in arguments to the constructor as well:

    my Dog $cerberus .= new(heads => 3);
    my Dog $cerberus = $cerberus.new(heads => 3);   # same thing

=item *

If you say

    my int @array is MyArray;

you are declaring that the elements of C<@array> are native integers,
but that the array itself is implemented by the C<MyArray> class.
Untyped arrays and hashes are still perfectly acceptable, but have
the same performance issues they have in Perl 5.

=item *

To get the number of elements in an array, use the C<.elems> method.  You can
also ask for the total string length of an array's elements, in bytes,
codepoints or graphemes, using these methods C<.bytes>, C<.codes> or C<.graphs>
respectively on the array.  The same methods apply to strings as well.
(Note that C<.bytes> is not guaranteed to be well-defined when the encoding
is unknown.  Similarly, C<.codes> is not well-defined unless you know which
canonicalization is in effect.  Hence, both methods allow an optional argument
to specify the meaning exactly if it cannot be known from context.)

There is no C<.length> method for either arrays or strings, because C<length>
does not specify a unit.

=item *

Built-in object types start with an uppercase letter. This includes
immutable types (e.g. C<Int>, C<Num>, C<Complex>, C<Rat>, C<Str>,
C<Bit>, C<Regex>, C<Set>, C<Block>, C<Iterator>,
C<Seq>), as well as mutable (container) types, such as C<Scalar>,
C<Array>, C<Hash>, C<Buf>, C<Routine>, C<Module>, and non-instantiable Roles
such as C<Callable>, C<Failure>, and C<Integral>.

Non-object (native) types are lowercase: C<int>, C<num>, C<complex>,
C<rat>, C<buf>, C<bit>.  Native types are primarily intended for
declaring compact array storage, that is, a sequence of storage locations of the specified type
laid out in memory contiguously without pointer indirection.  However, Perl will try to make those
look like their corresponding uppercase types if you treat them that way.
(In other words, it does autoboxing.  Note, however, that sometimes
repeated autoboxing can slow your program more than the native type
can speed it up.)

Some object types can behave as value types.  Every object can produce
a "WHICH" value that uniquely identifies the
object for hashing and other value-based comparisons.  Normal objects
just use their location as their identity, but if a class wishes to behave as a
value type, it can define a C<.WHICH> method that makes different objects
look like the same object if they happen to have the same contents.

When we say that a normal object uses its location as its identity,
we do I<not> mean that it returns its address as a number.  In the first
place, not all objects are in the same memory space (see the literature
on NUMA, for instance), and two objects should not accidentally have
the same identity merely because they were stored at the same offset in
two different memory spaces.  We also do not want to allow accidental
identity collisions with values that really are numbers (or strings,
or any other mundane value type).  Nor should we be encouraging people
to think of object locations that way in any case.  So C<WHICH> still
returns a value rather than another object, but that value must be of
a special C<ObjAt> type that prevents accidental confusion with normal
value types, and at least discourages trivial pointer arithmetic.

Certainly, it is difficult to give a unique name to every possible
address space, let alone every possible address within every such
a space.  In the absence of a universal naming scheme, it can only
be made improbable that two addresses from two different spaces will
collide.  A sufficently large random number may represent the current
address space on output of an C<ObjAt> to a different address space,
or if serialized to YAML or XML.  (This extra identity component
need not be output for debugging messages that assume the current
address space, since it will be the same big number consistently,
unless your process really is running under a NUMA.)

Alternately, if an object is being serialized to a form that does
not preserve object identity, there is no requirement to preserve
uniqueness, since the object is in this case is really being translated
to a value type representation, and reconstituted on the other end
as a different unique object.

=item *

Variables with non-native types can always contain I<undefined> values,
such as C<Any>, C<Whatever> and C<Failure> objects.  See S04 for more
about failures (i.e. unthrown exceptions):

    my Int $x = Int;    # works

Variables with native types do not support undefinedness: it is an error
to assign an undefined value to them:

    my int $y = Int;    # dies

Since C<num> can support the value C<NaN> but not the general concept of
undefinedness, you can coerce an undefined value like this:

    my num $n = computation() // NaN;

Variables of non-native types start out containing an undefined value
unless explicitly initialized to a defined value.

=item *

Every object supports a C<HOW> function/method that returns the
metaclass instance managing it, regardless of whether the object
is defined:

    'x'.HOW.methods('x');   # get available methods for strings
    Str.HOW.methods(Str);   # same thing with the prototype object Str
    HOW(Str).methods(Str);  # same thing as function call

    'x'.methods;        # this is likely an error - not a meta object
    Str.methods;        # same thing

(For a prototype system (a non-class-based object system), all objects
are merely managed by the same meta object.)

=item *

Perl supports generic types through what are called "roles"
which represent capabilities or interfaces.  These roles
are generally not used directly as object types.  For instance
all the numeric types perform the C<Numeric> role, and all
string types perform the C<Stringy> role, but there's no
such thing as a "Numeric" object, since these are generic
types that must be instantiated with extra arguments to produce
normal object types.  Common roles include:


=item *

Perl 6 intrinsically supports big integers and rationals through its
system of type declarations.  C<Int> automatically supports promotion
to arbitrary precision, as well as holding C<Inf> and C<NaN> values.
Note that C<Int> assumes 2's complement arithmetic, so C<+^1 == -2>
is guaranteed.  (Native C<int> operations need not support this on
machines that are not natively 2's complement.  You must convert to
and from C<Int> to do portable bitops on such ancient hardware.)

C<Num> must support the largest native floating point format that
runs at full speed.  It may be bound to an arbitrary precision type,
but by default it is the same type as a native C<num>.  See below.

C<Rat> supports extended precision rational arithmetic.
Dividing two C<Integral> objects using C<< infix:</> >> produces a
a C<Rat>, which is generally usable anywhere a C<Num> is usable, but
may also be explicitly cast to C<Num>.  (Also, if either side is
C<Num> already, C<< infix:</> >> gives you a C<Num> instead of a C<Rat>.)

C<Rat> and C<Num> both do the C<Real> role.

Lower-case types like C<int> and C<num> imply the native
machine representation for integers and floating-point numbers,
respectively, and do not promote to arbitrary precision, though
larger representations are always allowed for temporary values.
Unless qualified with a number of bits, C<int> and C<num> types represent
the largest native integer and floating-point types that run at full speed.

Numeric values in untyped variables use C<Int> and C<Num> semantics
rather than C<int> and C<num>.

However, for pragmatic reasons, C<Rat> values are guaranteed to be
exact only up to a certain point.  By default, this is the precision
that would be represented by the C<Rat64> type, which is an alias for
C<Rational[Int,uint64]>, which has a numerator
of C<Int> but is limited to a denominator of C<uint64>.  A C<Rat64> that
would require more than 64 bits of storage in the denominator is
automatically converted either to a C<Num> or to a lesser-precision
C<Rat>, at the discretion of the implementation.  (Native types such
as C<rat64> limit the size of both numerator and denominator, though
not to the same size.  The numerator should in general be twice the
size of the denominator to support user expectations.  For instance,
a C<rat8> actually supports C<Rational[int16,uint8]>, allowing
numbers like C<100.01> to be represented, and a C<rat64>,
defined as C<Rational[int128,int64]>, can hold the number of seconds since
the Big Bang with attosecond precision.  Though perhaps not with
attosecond accuracy...)

The limitation on C<Rat> values is intended to be enforced only on
user-visible types.  Intermediate values used internally in calculation
the values of C<Rat> operators may exceed this precision, or represent
negative denominators.  That is, the temporaries used in calculating
the new numerator and denominator are (at least in the abstract) of
C<Int> type.  After a new numerator and denominator are determined,
any sign is forced to be represented only by the numerator.  Then if
the denominator exceeds the storage size of the unsigned integer used,
the fraction is reduced via gcd.  If the resulting denominator is still
larger than the storage size, then and I<only> then may the precision
be reduced to fit into a C<Rat> or C<Num>.

C<Rat> addition and subtraction should attempt to preserve the
denominator of the more precise argument if that denominator is
an integral multiple of the less precise denominator.  That is,
in practical terms, adding a column of dollars and cents should
generally end up with a result that has a denominator of 100, even
if values like 42 and 3.5 were added in.  With other operators,
this guarantee cannot be made; in such cases, the user should probably
be explicitly rounding to a particular denominator anyway.

For applications that really need arbitrary precision denominators as
well as numerators at the cost of performance, C<FatRat> may be used,
which is defined as C<Rational[Int,Int]>, that is, as arbitrary precision in
both parts.  There is no literal form for a C<FatRat>, so it must
be constructed using C<FatRat.new($nu,$de)>.  In general, only math
operators with at least one C<FatRat> argument will return another
C<FatRat>, to prevent accidental promotion of reasonably fast C<Rat>
values into arbitrarily slow C<FatRat> values.

Although most rational implementations normalize or "reduce" fractions
to their smallest representation immediately through a gcd algorithm,
Perl allows a rational datatype to do so lazily at need, such as
whenever the denominator would run out of precision, but avoid the
overhead otherwise.  Hence, if you are adding a bunch of C<Rat>s that
represent, say, dollars and cents, the denominator may stay 100 the
entire way through.  The C<.nu> and C<.de> methods will return these
unreduced values.  You can use C<$rat.=norm> to normalize the fraction.
(This also forces the sign on the denominator to be positive.)
The C<.perl> method will produce a decimal number if the denominator is
a power of 10, or normalizable to a power of 10 (that is, having factors
of only 2 and 5 (and -1)).  Otherwise it will normalize and return a rational
literal of the form C<-47/3>.  Stringifying a rational does a similar
calculation, with the same result on decimal-normalizable fractions,
but where C<.perl> would produce the C<-47/3> form, stringification instead
converts to C<Num> and stringifies that, so the rational internal form is
somewhat hidden from the casual user, who would generally prefer
to see pure decimal notation.

    say 1/5;    # 0.2 exactly (not via Num)
    say 1/3;    # 0.333333333333333 via Num

    say <2/6>.perl
                # 1/3

    say 3.14159_26535_89793
                # 3.141592653589793 including last digit

    say 111111111111111111111111111111111111111111111.123
                # 111111111111111111111111111111111111111111111.123

    say 555555555555555555555555555555555555555555555/5
                # 111111111111111111111111111111111111111111111

    say <555555555555555555555555555555555555555555555/5>.perl
                # 111111111111111111111111111111111111111111111/1

=item *

Perl 6 should by default make standard IEEE floating point concepts
visible, such as C<Inf> (infinity) and C<NaN> (not a number).  Within a
lexical scope, pragmas may specify the nature of temporary values,
and how floating point is to behave under various circumstances.
All IEEE modes must be lexically available via pragma except in cases
where that would entail heroic efforts to bypass a braindead platform.

The default floating-point modes do not throw exceptions but rather
propagate Inf and NaN.  The boxed object types may carry more detailed
information on where overflow or underflow occurred.  Numerics in Perl
are not designed to give the identical answer everywhere.  They are
designed to give the typical programmer the tools to achieve a good
enough answer most of the time.  (Really good programmers may occasionally
do even better.)  Mostly this just involves using enough bits that the
stupidities of the algorithm don't matter much.

=item *

A C<Str> is a Unicode string object.  There is no corresponding native
C<str> type.  However, since a C<Str> object may fill multiple roles,
we say that a C<Str> keeps track of its minimum and maximum Unicode
abstraction levels, and plays along nicely with the current lexical
scope's idea of the ideal character, whether that is bytes, codepoints,
graphemes, or characters in some language.  For all builtin operations,
all C<Str> positions are reported as position objects, not integers.
These C<StrPos> objects point into a particular string at a particular
location independent of abstraction level, either by tracking the
string and position directly, or by generating an abstraction-level
independent representation of the offset from the beginning of the
string that will give the same results if applied to the same string
in any context.  This is assuming the string isn't modified in the
meanwhile; a C<StrPos> is not a "marker" and is not required to follow
changes to a mutable string.  For instance, if you ask for the positions
of matches done by a substitution, the answers are reported in terms of the
original string (which may now be inaccessible!), not as positions within
the modified string.

The subtraction of two C<StrPos> objects gives a C<StrLen> object,
which is also not an integer, because the string between two positions
also has multiple integer interpretations depending on the units.
A given C<StrLen> may know that it represents 18 bytes, 7 codepoints,
3 graphemes, and 1 letter in Malayalam, but it might only know this
lazily because it actually just hangs onto the two C<StrPos> endpoints
within the string that in turn may or may not just lazily point into
the string.  (The lazy implementation of C<StrLen> is much like a
C<Range> object in that respect.)

If you use integers as arguments where position objects are expected,
it will be assumed that you mean the units of the current lexically
scoped Unicode abstraction level.  (Which defaults to graphemes.)
Otherwise you'll need to coerce to the proper units:

    substr($string, Bytes(42), ArabicChars(1))

Of course, such a dimensional number will fail if used on a string
that doesn't provide the appropriate abstraction level.

If a C<StrPos> or C<StrLen> is forced into a numeric context, it will
assume the units of the current Unicode abstraction level.  It is
erroneous to pass such a non-dimensional number to a routine that
would interpret it with the wrong units.

Implementation note: since Perl 6 mandates that the default Unicode
processing level must view graphemes as the fundamental unit rather
than codepoints, this has some implications regarding efficient
implementation.  It is suggested that all graphemes be translated on
input to a unique grapheme numbers and represented as integers within
some kind of uniform array for fast substr access.  For those graphemes
that have a precomposed form, use of that codepoint is suggested.
(Note that this means Latin-1 can still be represented internally
with 8-bit integers.)

For graphemes that have no precomposed form, a temporary private
id should be assigned that uniquely identifies the grapheme.
If such ids are assigned consistently throughout the process,
comparison of two graphemes is no more difficult than the comparison
of two integers, and comparison of base characters no more difficult
than a direct lookup into the id-to-NFD table.

Obviously, any temporary grapheme ids must be translated back to
some universal form (such as NFD) on output, and normal precomposed
graphemes may turn into either NFC or NFD forms depending on the
desired output.  Maintaining a particular grapheme/id mapping over the
life of the process may have some GC implications for long-running
processes, but most processes will likely see a limited number of
non-precomposed graphemes.

If the program has a scope that wants a codepoint view rather than
a grapheme view, the string visible to that lexical scope must also
be translated to universal form, just as with output translation.
Alternately, the temporary grapheme ids may be hidden behind an
abstraction layer.  In any case, codepoint scope should never see
any temporary grapheme ids.  (The lexical codepoint declaration
should probably specify which normalization form it prefers to
view strings under.  Such a declaration could be applied to input
translation as well.)

=item *

A C<Buf> is a stringish view of an array of
integers, and has no Unicode or character properties without explicit
conversion to some kind of C<Str>.  (The C<buf8>, C<buf16>, C<buf32>,
and C<buf64> types are the native counterparts; native buf types are
required to occupy contiguous memory for the entire buffer.)
Typically a C<Buf> is an array of bytes serving as a buffer.  Bitwise
operations on a C<Buf> treat the entire buffer as a single large
integer.  Bitwise operations on a C<Str> generally fail unless the
C<Str> in question can provide an abstract C<Buf> interface somehow.
Coercion to C<Buf> should generally invalidate the C<Str> interface.
As a generic role C<Buf> may be instantiated as any
of C<buf8>, C<buf16>, or C<buf32> (or as any type that provides the
appropriate C<Buf> interface), but when used to create a buffer C<Buf>
is punned to a class implementing C<buf8> (actually C<Buf[uint8]>).

Unlike C<Str> types, C<Buf> types prefer to deal with integer string
positions, and map these directly to the underlying compact array
as indices.  That is, these are not necessarily byte positions--an
integer position just counts over the number of underlying positions,
where one position means one cell of the underlying integer type.
Builtin string operations on C<Buf> types return integers and expect
integers when dealing with positions.  As a limiting case, C<buf8> is
just an old-school byte string, and the positions are byte positions.
Note, though, that if you remap a section of C<buf32> memory to be
C<buf8>, you'll have to multiply all your positions by 4.

These native types are defined based on the C<Buf> role, parameterized
by the native integer type it is composed of:

    Name        Is really
    ====        =========
    buf1        Buf[bit]
    buf8        Buf[uint8]
    buf16       Buf[uint16]
    buf32       Buf[uint32]
    buf64       Buf[uint64]

There are no signed buf types provided as built-ins, but you may say


to get buffers of signed integers.  It is also possible to defined
a C<Buf> based on non-integers or on non-native types:


However, no guarantee of memory contiguity can be made for non-native types.

=item *

The C<utf8> type is derived from C<buf8>, with the additional constraint
that it may only contain validly encoded UTF-8.  Likewise, C<utf16> is
derived from C<buf16>, and C<utf32> from C<buf32>.

Note that since these are type names, parentheses must always be
used to call them as coercers, since the listop form is not allowed
for coercions.  That is:

    utf8 op $x

is always parsed as

    (utf8) op $x

and never as

    utf8(op $x)

=item *

The C<*> character as a standalone term captures the notion of "Whatever",
the meaning of which can be decided lazily by whatever it is an argument to.
Alternately, for those unary and binary operators that don't care to handle
C<*> themselves, it is automatically curried at compile time into a closure
that takes one or two arguments.  (See below.)

Generally, when an operator handles C<*> itself, it can often
be thought of as a "glob" that gives you everything it can in that
argument position.  For instance, here are some operators that
choose to handle C<*> and give it special meaning:

    if $x ~~ 1..* {...}                 # if 1 <= $x <= +Inf
    my ($a,$b,$c) = "foo" xx *;         # an arbitrary long list of "foo"
    if /foo/ ff * {...}                 # a latching flipflop
    @slice = @x[*;0;*];                 # all indexes for 1st and 3rd dimensions
    @slice = %x{*;'foo'};               # all keys in domain of 1st dimension
    @array[*]                           # list of all values, unlike @array[]
    (*, *, $x) = (1, 2, 3);             # skip first two elements
                                        # (same as lvalue "undef" in Perl 5)

C<Whatever> is an undefined prototype object derived from C<Any>.  As a
type it is abstract, and may not be instantiated as a defined object.
When used for a particular MMD dispatch, and nothing in the MMD system claims it,
it dispatches to as an C<Any> with an undefined value, and (we hope)
blows up constructively.

Since the C<Whatever> object is effectively immutable, the optimizer is
free to recognize C<*> and optimize in the context of what operator
it is being passed to.  An operator can declare that it wants to
handle C<*> either by declaring one or more of its arguments for at
least one of its candidates with an argument of type C<Whatever>, or
by marking the proto sub with the trait, C<is like-Whatever-and-stuff>.
[Conjecture: actually, this is negotiable--we might shorten it
to C<is like(Whatever)> or some such.  C<:-)>]

For any unary or binary operator (specifically, any prefix, postfix,
and infix operator), if the operator has not specifically requested
to handle C<*> itself, the compiler is required to translate directly
to an appropriately curried closure at compile time.  Most of the
built-in numeric operators fall into this category, so:

    * - 1
    '.' x *
    * + *

are internally curried into closures of one or two arguments:

    { $^x - 1 }
    { '.' x $^y }
    { $^x + $^y }

This rewrite happens after variables are looked up in their lexical scope,
and after declarator install any variables into the lexical scope,
with the result that

    * + (state $s = 0)

is effectively curried into:

    -> $x { $x + (state $OUTER::s = 0) }

rather than:

    -> $x { $x + (state $s = 0) }

In other words, C<*> currying does not create a useful lexical scope.
(Though it does have a dynamic scope when it runs.) This prevents the
semantics from changing drastically if the operator in question
suddenly decides to handle C<Whatever> itself.

As a postfix operator, a method call is one of those operators that is
automatically curried.  Something like:


is rewritten as:

    { $^x.meth(1,2,3) }

In addition to currying a method call without an invocant, such
curried methods are handy anywhere a smartmatcher is expected:

    @primes = grep *.prime, 2..*;
    subset Duck where *.^can('quack');
    when !*.defined {...}

These returned closures are of type C<WhateverCode:($)> or C<WhateverCode:($,$)>
rather than type C<Whatever>, so constructs that do want to handle C<*>
or its derivative closures can distinguish them by type:

    @array[*]    # subscript is type Whatever, returns all elements
    @array[*-1]  # subscript is type WhateverCode:($), returns last element

    0, 1, *+1 ... *  # counting
    0, 1, *+* ... *  # fibonacci

For any prefix, postfix, or infix operator that would be curried by a
C<Whatever>, a C<WhateverCode> also autocurries it, such that any noun
phrase based on C<*> as a head noun autocurries transitively outward as
far as it makes sense, including outward through metaoperators.  Hence:

    * + 2 + 3   # { $^x + 2 + 3 }
    * + 2 + *   # { $^x + 2 + $^y }
    * + * + *   # { $^x + $^y + $^z }
    (-*.abs)i   # { (-$^x.abs)i }
    @a «+» *    # { @a «+» $^x }

This is only for operators that are not C<Whatever>-aware.  There is no requirement
that a C<Whatever>-aware operator return a C<WhateverCode> when C<Whatever>
is used as an argument; that's just the I<typical> behavior for functions
that have no intrinsic "globbish" meaning for C<*>.  If you want to curry
one of these globbish operators, you'll need to write an explicit closure or do
an explicit curry on the operator with C<.assuming()>.  Operators in
this class, such as C<< infix:<..> >> and C<< infix:<xx> >>, typically I<do>
autocurry arguments of type C<WhateverCode> even though they do not
autocurry C<Whatever>, so we have:

    "foo" xx *          # infinite supply of "foo"
    "foo" xx *-1        # { "foo" xx $^a - 1 }
    0 .. *              # half the real number line
    0 .. * - 1          # { 0 .. $^a - 1 }
    * - 3 .. * - 1      # { $^a - 3 .. $^b - 1 }

(If the last is used as a subscript, the subscripter notices there are two
arguments and passes that dimension's size twice.)

Operators that are known to return non-closure values with C<*> include:

    0 .. *      # means 0 .. Inf
    0 ... *     # means 0 ... Inf
    'a' xx *    # means 'a' xx Inf
    1,*         # means 1,*  :)

    $a = *      # just assigns Whatever
    $a ~~ *     # just matches Whatever

Note that the last two also do not autocurry C<WhateverCode>, because
assignment and smartmatching are not really normal binary operator, but
syntactic sugar for underlying primitives.  (Such pseudo operators
may also place restrictions on which meta-operators work on them.)

Neither does the sequence operators C<< &infix:<...> >> and
C<< &infix:<...^> >> autocurry C<WhateverCode>, because we want to allow
WhateverCode closures as the stopper:

    0 ...^ *>5  # means 0, 1, 2, 3, 4, 5

[Conjecture: it is possible that, for most of the above operators that
take C<*> to mean C<Inf>, we could still actually return a closure
that defaults that particular argument to C<Inf>.  However, this would
work only if we provide a "value list context" that forbids closures,
in the sense that it always calls any closure it finds in its list
and replaces the closure in the list with its return value or values,
and then rescans from that point (kinda like a text macro does), in case
the closure returned a list containing a closure.  So for example,
the closure returned by C<0..*> would interpolate a C<Range> object into
the list when called.  Alternately, it could return the C<0>, followed
by another closure that does C<1..*>.  Even the C<...> operator could
likely be redefined in terms of a closure that regenerates itself,
as long as we figure out some way of remembering the last N values
each time.]

In any case, array indexes must behave as such a 'value list context',
since you can't directly index an array with anything other than a number.
The final element of an array is subscripted as C<@a[*-1]>,
which means that when the subscripting operation discovers a C<Code:($)>
object for a subscript, it calls it and supplies an argument indicating
the number of elements in (that dimension of) the array.  See S09.

A variant of C<*> is the C<**> term, which is of type C<HyperWhatever>.
It is generally understood to be a multidimension form of C<*> when
that makes sense.  When modified by an operator that would turn C<*>
into a function of one argument, C<WhateverCode:($)>, C<**> instead turns into
a function with one slurpy argument, C<Code(*@)>, such that multiple
arguments are distributed to some number of internal whatevers.
That is:

    * - 1    means                -> $x { $x - 1 }
    ** - 1   means   -> *@x { map -> $x { $x - 1 }, @x }

Therefore C<@array[^**]> represents C<< @array[{ map { ^* }, @_ }] >>,
that is to say, every element of the array, no matter how many dimensions.
(However, C<@array[**]> means the same thing because (as with C<...>
above), the subscript operator will interpret bare C<**> as meaning
all the subscripts, not the list of dimension sizes.  The meaning of
C<Whatever> is always controlled by the first context it is bound into.)

Other uses for C<*> and C<**> will doubtless suggest themselves
over time.  These can be given meaning via the MMD system, if not
the compiler.  In general a C<Whatever> should be interpreted as
maximizing the degrees of freedom in a dwimmy way, not as a nihilistic
"don't care anymore--just shoot me".


=head2 Native types

Values with these types autobox to their uppercase counterparts when
you treat them as objects:

    bit         single native bit
    int         native signed integer
    uint        native unsigned integer (autoboxes to Int)
    buf         native buffer (finite seq of native ints or uints, no Unicode)
    rat         native rational
    num         native floating point
    complex     native complex number
    bool        native boolean

Since native types cannot represent Perl's concept of undefined values,
in the absence of explicit initialization, native floating-point types
default to NaN, while integer types (including C<bit>) default to 0.
The complex type defaults to NaN + NaN\i.  A buf type of known size
defaults to a sequence of 0 values.  If any native type is explicitly
initialized to C<*> (the C<Whatever> type), no initialization is attempted
and you'll get whatever was already there when the memory was allocated.

If a buf type is initialized with a Unicode string value, the string
is decomposed into Unicode codepoints, and each codepoint shoved into
an integer element.  If the size of the buf type is not specified,
it takes its length from the initializing string.  If the size
is specified, the initializing string is truncated or 0-padded as
necessary.  If a codepoint doesn't fit into a buf's integer type,
a parse error is issued if this can be detected at compile time;
otherwise a warning is issued at run time and the overflowed buffer
element is filled with an appropriate replacement character, either
C<U+FFFD> (REPLACEMENT CHARACTER) if the element's integer type is at
least 16 bits, or C<U+007f> (DELETE) if the larger value would not fit.
If any other conversion is desired, it must be specified explicitly.
In particular, no conversion to UTF-8 or UTF-16 is attempted; that
must be specified explicitly.  (As it happens, conversion to a buf
type based on 32-bit integers produces valid UTF-32 in the native

=head2 The C<Mu> type

Among other things, C<Mu> is named after the eastern concept of
"Mu" or 無 (see L<http://en.wikipedia.org/wiki/MU>, especially the
"Mu_(negative)" entry), so in Perl 6 it stands in for Perl 5's
concept of "undef" when that is used as a noun.  However, C<Mu> is also
the "nothing" from which everything else is derived via the undefined
type objects, so it stands in for the concept of "Object" as used in
languages like Java.  Or think of it as a "micro" or µ-object that
is the basis for all other objects, something atomic like a Muon.
Or if acronyms make you happy, there are a variety to pick from:

    Most Universal
    More Undefined
    Modern Undef
    Master Union
    Meta Ur
    Mega Up

Or just think of it as a sound a cow makes, which simultaneously
means everything and nothing.

=head2 Undefined types

Perl 6 does not have a single value representing undefinedness.
Instead, objects of various types can carry type information while
nevertheless remaining undefined themselves.  Whether an object is
defined is determined by whether C<.defined> returns true or not.
These typed objects typically represent uninitialized values.  Failure
objects are also officially undefined despite carrying exception
information; these may be created using the C<fail> function, or by
direct construction of an exception object of some sort.  (See S04
for how failures are handled.)

    Mu          Most Undefined
    Failure     Failure (lazy exceptions, thrown if not handled properly)

Whenever you declare any kind of type, class, module, or package, you're
automatically declaring a undefined prototype value with the same name, known
as the I<type object>.  The name itself returns that type object:

    Mu          Perl 6 object (default block parameter type, Any, Junction, or Each)
    Any         Perl 6 object (default routine parameter type, excludes junction)
    Cool        Perl 6 Convenient OO Loopbacks
    Whatever    Wildcard (like Any, but subject to do-what-I-mean via MMD)
    Int         Any Int object
    Widget      Any Widget object

Type objects stringify to their name with empty parens concatenated.
Note that type objects are not classes, but may be used to name

    Widget.new()        # create a new Widget

Whenever a C<Failure> value is put into a typed container, it takes
on the type specified by the container but continues to carry the
C<Failure> role.  Use C<fail> to return specific failures.  Use C<Mu>
for the most generic non-failure undefined value.  The C<Any> type,
derived from C<Mu>, is also undefined, but excludes C<Junction> and C<Each> types so
that autothreading may be dispatched using normal multiple dispatch
rules.  All user-defined classes derive from the C<Any> class by default.
The C<Whatever> type is derived from C<Any> but nothing else
is derived from it.

=head2 Immutable types

Objects with these types behave like values, i.e. C<$x === $y> is true
if and only if their types and contents are identical (that is, if
C<$x.WHICH> eqv C<$y.WHICH>).

    Str         Perl string (finite sequence of Unicode characters)
    Bit         Perl single bit (allows traits, aliasing, undefinedness, etc.)
    Int         Perl integer (allows Inf/NaN, arbitrary precision, etc.)
    Num         Perl number (approximate Real, generally via floating point)
    Rat         Perl rational (exact Real, limited denominator)
    FatRat      Perl rational (unlimited precision in both parts)
    Complex     Perl complex number
    Bool        Perl boolean
    Exception   Perl exception
    Block       Executable objects that have lexical scopes
    Seq         A list of values (can be generated lazily)
    Range       A pair of Ordered endpoints
    Set         Unordered collection of values that allows no duplicates
    Bag         Unordered collection of values that allows duplicates
    Enum        An immutable Pair
    EnumMap     A mapping of Enums with no duplicate keys
    Signature   Function parameters (left-hand side of a binding)
    Parcel      Arguments in a comma list
    LoL         Arguments in a semicolon list
    Capture     Function call arguments (right-hand side of a binding)
    Blob        An undifferentiated mass of ints, an immutable Buf
    Instant     A point on the continuous atomic timeline
    Duration    The difference between two Instants
    HardRoutine A routine that is committed to not changing

C<Set> values may be composed with the C<set> listop or method.
C<Bag> values may be composed with the C<bag> listop or method.

C<Instant>s and C<Duration>s are measured in atomic seconds with
fractions.  Notionally they are real numbers which may be implemented
in any C<Real> type of sufficient precision, preferably a C<Rat> or
C<FatRat>.  (Implementations that make fixed-point assumptions about
the available subsecond precision are discouraged; the user
interface must act like real numbers in any case.)  Interfaces that
take C<Duration> arguments, such as sleep(), may also take C<Real>
arguments, but C<Instant> arguments must be explicitly created
via any of various culturally aware time specification APIs.  A small
number of C<Instant> values that represent common epoch instant values
are also available.

In numeric context a C<Duration> happily returns a C<Rat> or C<FatRat>
representing the number of seconds.  C<Instant> values, on the other
hand, are largely opaque, numerically speaking, and in particular
are epoch agnostic.  (Any epoch is just a particular C<Instant>, and
all times related to that epoch are really C<Instant> ± C<Duration>,
which returns a new C<Instant>.)  In order to facilitate the writing of
culturally aware time modules, the C<Instant> type provides C<Instant>
values corresponding to various commonly used epochs, such as the
1958 TAI epoch, the POSIX epoch, the Mac epoch, and perhaps the
year 2000 epoch as UTC thinks of it.   There's no reason to exclude
any useful epoch that is well characterized in atomic seconds.
All normal times can be calculated from those epoch instants using
addition and subtraction of C<Duration> values.  Note that the
C<Duration> values are still just atomic time without any cultural
deformations; in particular, the C<Duration> formed of by subtracting
C<Instant::Epoch::POSIX> from the current instant will contain more
seconds than the current POSIX C<time()> due to POSIX's abysmal ignorance
of leap seconds.  This is not the fault of the universe, which is
not fooled (neglecting relativistic considerations).  C<Instant>s and
C<Duration>s are always linear atomic seconds.  Systems which cannot
officially provide a steady time base, such as POSIX systems, will
simply have to make their best guess as to the correct atomic time
when asked to interconvert between cultural time and atomic time.
Alternately, they may use some other less-official time mechanism
to achieve steady clock behavior.  Most Unix systems can count clock
ticks, even if POSIX time types get confused.

Although the conceptual type of an C<Instant> resembles C<FatRat>,
with arbitrarily large size in either numerator or denominator, the
internal form may of course be optimized internally for "nearby" times,
so that, if we know the year as an integer, the instant within the
year can just be a C<Rat> representing the offset from the beginning
of the year.  Calculations that fall within the same year can then be
done in C<Rat> rather than C<FatRat>, or a table of yearly offsets
can find the difference in integer seconds between two years, since
(so far) nobody has had the nerve to propose fractional leap seconds.
Or whatever.  C<Instant> is opaque, so we can swap implementations
in and out without user-visible consequences.

The term C<now> returns the current time as an C<Instant>.  As with the
C<rand> and C<self> terms, it is not a function, so don't put parens after it.
It also never looks for arguments, so the next token should be an operator
or terminator.

    now + 300   # the instant five minutes from now

Basic math operations are defined for instants and durations such
that the sum of an instant and a duration is always an instant,
while the difference of two instants is always a duration.  Math on
instants may only be done with durations (or numbers that will
be taken as durations, as above); you may not add two instants.

    $instant + $instant      # WRONG
    $instant - $instant      # ok, returns a duration
    $instant + $duration     # ok, returns an instant

Numeric operations on durations return C<Duration> where that makes
sense (addition, subtraction, modulus).  The type returned for other
numeric operations is unspecified; they may return normal numeric
types or they may return other dimensional types that attempt to
assist in dimensional analysis.  (The latter approach should likely
require explicit declaration for now, until we can demonstrate that
it does not adversely impact the average programmer, and that it
plays well with the concept of gradual typing.)

The C<Blob> type is like an immutable buffer, and therefore
responds both to array and (some) stringy operations.  Note that,
like a C<Buf>, its size is measured in whatever the base unit is,
which is not always bytes.  If you have a C<my Blob[bit] $blob>,
then C<$blob.elems> returns the number of bits in it.  As with buffers,
various native types are automatically derived from native unsigned int types:

    blob1       Blob[bit], a bit string
    blob2       Blob[uint2], a DNA sequence?
    blob3       Blob[uint[3]], an octal string
    blob4       Blob[uint4], a hex string
    blob8       Blob[uint8], a byte string
    blob16      Blob[uint16]
    blob32      Blob[uint32]
    blob64      Blob[uint64]

These types do (at least) the following roles:

    Class       Roles
    =====       =====
    Str         Stringy
    Bit         Numeric Boolean Integral
    Int         Numeric Integral
    Num         Numeric Real
    Rat         Numeric Real Rational
    FatRat      Numeric Real Rational
    Complex     Numeric
    Bool        Boolean
    Exception   Failure
    Block       Callable
    Seq         Iterable
    Range       Iterable
    Set         Associative[Bool] Iterable
    Bag         Associative[UInt] Iterable
    Enum        Associative
    EnumMap     Associative Positional Iterable
    Parcel      Positional
    Capture     Positional Associative
    Blob        Stringy Positional
    Instant     Real
    Duration    Real
    HardRoutine Routine

[Conjecture:  C<Stringy> may best be split into 2 roles where both C<Str>
and C<Blob> compose the more general one and just C<Str> composes a less
general one.  The more general of those would apply to what is common to
any dense sequence ("string") that C<Str> and C<Blob> both are (either of
characters or bits or integers etc), and the string operators like
catenation (C<~>) and replication (C<x>, C<xx>) would be part of the more
general role.  The more specific role would apply to C<Str> but not C<Blob>
and includes any specific operators that are specific to I<characters> and
don't apply to bits or integers etc.  The other alternative is to more
clearly distance character strings from bit strings, keeping C<~>/etc for
character strings only and adding an analogy for bit strings.]

The C<Iterable> role indicates not that you can iterate the type
directly, but that you can request the type to return an iterator.
Iterable types may have multiple iterators (lists) running across them
simultaneously, but an iterator/list itself has only one thread of
consumption.  Every time you do C<get> on an iterator, a value
disappears from its list.

Note that C<Set> and C<Bag> iterators return only keys, not values.  You
must explicitly use c<.pairs> to get key/value pairs.

=head2 Mutable types

Objects with these types have distinct C<.WHICH> values that do not change
even if the object's contents change.  (Routines are considered mutable
because they can be wrapped in place.)

    Iterator    Perl list
    SeqIter     Iterator over a Seq
    RangeIter   Iterator over a Range
    Scalar      Perl scalar
    Array       Perl array
    Hash        Perl hash
    KeySet      KeyHash of Bool (does Set in list/array context)
    KeyBag      KeyHash of UInt (does Bag in list/array context)
    Pair        A single key-to-value association
    PairSeq     A Seq of Pairs
    Buf         Perl buffer (array of integers with some stringy features)
    IO          Perl filehandle
    Routine     Base class for all wrappable executable objects
    Sub         Perl subroutine
    Method      Perl method
    Submethod   Perl subroutine acting like a method
    Macro       Perl compile-time subroutine
    Regex       Perl pattern
    Match       Perl match, usually produced by applying a pattern
    Stash       A symbol table hash (package, module, class, lexpad, etc)
    SoftRoutine A routine that is committed to staying mutable

The C<KeyHash> role differs from a normal C<Associative> hash in how it handles default
values.  If the value of a C<KeyHash> element is set to the default
value for the C<KeyHash>, the element is deleted.  If undeclared,
the default default for a C<KeyHash> is 0 for numeric types, C<False>
for boolean types, and the null string for string and buffer types.
A C<KeyHash> of an object type defaults to the undefined prototype
for that type.  More generally, the default default is whatever defined
value a C<Nil> would convert to for that value type.  A C<KeyHash>
of C<Scalar> deletes elements that go to either 0 or the null string.
A C<KeyHash> also autodeletes keys for normal undefined values (that is,
those undefined values that do not contain an unthrown exception).

A C<KeySet> is a C<KeyHash> of booleans with a default of C<False>.
If you use the C<Hash> interface and increment an element of a
C<KeySet> its value becomes true (creating the element if it doesn't
exist already).  If you decrement the element it becomes false and
is automatically deleted.  Decrementing a non-existing value results
in a C<False> value.  Incrementing an existing value results in C<True>.
When not used as a C<Hash> (that is,
when used as an C<Array> or list or C<Set> object) a C<KeySet>
behaves as a C<Set> of its keys.  (Since the only possible value of
a C<KeySet> is the C<True> value, it need not be represented in
the actual implementation with any bits at all.)

A C<KeyBag> is a C<KeyHash> of C<UInt> with default of 0.  If you
use the C<Hash> interface and increment an element of a C<KeyBag>
its value is increased by one (creating the element if it doesn't exist
already).  If you decrement the element the value is decreased by one;
if the value goes to 0 the element is automatically deleted.  An attempt
to decrement a non-existing value results in a C<Failure> value.  When not
used as a C<Hash> (that is, when used as an C<Array> or list or C<Bag>
object) a C<KeyBag> behaves as a C<Bag> of its keys, with each key
replicated the number of times specified by its corresponding value.
(Use C<.kv> or C<.pairs> to suppress this behavior in list context.)

As with C<Hash> types, C<Pair> and C<PairSeq> are mutable in their
values but not in their keys.  (A key can be a reference to a mutable
object, but cannot change its C<.WHICH> identity.  In contrast,
the value may be rebound to a different object, just as a hash
element may.)

The following roles are supported:

    Iterator    List
    Array       Positional Iterable
    Hash        Associative
    KeySet      KeyHash[Bool]
    KeyBag      KeyHash[UInt]
    KeyHash     Associative
    Pair        Associative
    PairSeq     Associative Postional Iterable
    Buf         Stringy
    Routine     Callable
    Sub         Callable
    Method      Callable
    Submethod   Callable
    Macro       Callable
    Regex       Callable
    Match       Positional Associative
    Stash       Associative
    SoftRoutine Routine

Types that do the C<List> role are generally hidden from casual
view, since iteration is typically triggered by context rather than
by explicit call to the iterator's C<.get> method.  Filehandles are
a notable exception.

See L<S06/"Wrapping"> for a discussion of soft vs. hard routines.

=head2 Value types

Explicit types are optional. Perl variables have two associated types:
their "value type" and their "implementation type".  (More generally, any
container has an implementation type, including subroutines and modules.)
The value type is stored as its C<of> property, while the implementation
type of the container is just the object type of the container itself.
The word C<returns> is allowed as an alias for C<of>.

The value type specifies what kinds of values may be stored in the
variable. A value type is given as a prefix or with the C<of> keyword:

    my Dog $spot;
    my $spot of Dog;

In either case this sets the C<of> property of the container to C<Dog>.

Subroutines have a variant of the C<of> property, C<as>, that sets
the C<as> property instead.  The C<as> property specifies a
constraint (or perhaps coercion) to be enforced on the return value (either
by explicit call to C<return> or by implicit fall-off-the-end return).
This constraint, unlike the C<of> property, is not advertised as the
type of the routine.  You can think of it as the implicit type signature of
the (possibly implicit) return statement.  It's therefore available for
type inferencing within the routine but not outside it.  If no C<as> type
is declared, it is assumed to be the same as the C<of> type, if declared.

    sub get_pet() of Animal {...}       # of type, obviously
    sub get_pet() returns Animal {...}  # of type
    our Animal sub get_pet() {...}      # of type
    sub get_pet() as Animal {...}       # as type

A value type on an array or hash specifies the type stored by each element:

    my Dog @pound;  # each element of the array stores a Dog

    my Rat %ship;   # the value of each entry stores a Rat

The key type of a hash may be specified as a shape trait--see S09.

=head2 Implementation types

The implementation type specifies how the variable itself is implemented. It is
given as a trait of the variable:

    my $spot is Scalar;             # this is the default
    my $spot is PersistentScalar;
    my $spot is DataBase;

Defining an implementation type is the Perl 6 equivalent to tying
a variable in Perl 5.  But Perl 6 variables are tied directly at
declaration time, and for performance reasons may not be tied with a
run-time C<tie> statement unless the variable is explicitly declared
with an implementation type that does the C<Tieable> role.

However, package variables are always considered C<Tieable> by default.
As a consequence, all named packages are also C<Tieable> by default.
Classes and modules may be viewed as differently tied packages.
Looking at it from the other direction, classes and modules that
wish to be bound to a global package name must be able to do the
C<Package> role.

=head2 Hierarchical types

A non-scalar type may be qualified, in order to specify what type of
value each of its elements stores:

    my Egg $cup;                       # the value is an Egg
    my Egg @carton;                    # each elem is an Egg
    my Array of Egg @box;              # each elem is an array of Eggs
    my Array of Array of Egg @crate;   # each elem is an array of arrays of Eggs
    my Hash of Array of Recipe %book;  # each value is a hash of arrays of Recipes

Each successive C<of> makes the type on its right a parameter of the
type on its left. Parametric types are named using square brackets, so:

    my Hash of Array of Recipe %book;

actually means:

    my Hash:of(Array:of(Recipe)) %book;

Because the actual variable can be hard to find when complex types are
specified, there is a postfix form as well:

    my Hash of Array of Recipe %book;           # HoHoAoRecipe
    my %book of Hash of Array of Recipe;        # same thing

The C<as> form may be used in subroutines:

    my sub get_book ($key) as Hash of Array of Recipe {...}

Alternately, the return type may be specified within the signature:

    my sub get_book ($key --> Hash of Array of Recipe) {...}

There is a slight difference, insofar as the type inferencer will
ignore a C<as> but pay attention to C<< --> >> or prefix type
declarations, also known as the C<of> type.  Only the inside of the
subroutine pays attention to C<as>, and essentially coerces the return
value to the indicated type, just as if you'd coerced each return expression.

You may also specify the C<of> type as the C<of> trait (with C<returns>
allowed as a synonym):

    my Hash of Array of Recipe sub get_book ($key) {...}
    my sub get_book ($key) of Hash of Array of Recipe {...}
    my sub get_book ($key) returns Hash of Array of Recipe {...}

=head2 Polymorphic types

Anywhere you can use a single type you can use a set of types, for convenience
specifiable as if it were an "or" junction:

    my Int|Str $error = $val;              # can assign if $val~~Int or $val~~Str

Fancier type constraints may be expressed through a subtype:

    subset Shinola of Any where {.does(DessertWax) and .does(FloorTopping)};
    if $shimmer ~~ Shinola {...}  # $shimmer must do both interfaces

Since the terms in a parameter could be viewed as a set of
constraints that are implicitly "anded" together (the variable itself
supplies type constraints, and C<where> clauses or tree matching just
add more constraints), we relax this to allow juxtaposition of
types to act like an "and" junction:

    # Anything assigned to the variable $mitsy must conform
    # to the type Fish and either the Squirrel or Dog type...
    my Squirrel|Dog Fish $mitsy = new Fish but { Bool.pick ?? .does Squirrel
                                                           !! .does Dog };

[Note: the above is a slight lie, insofar as parameters are currently
restricted for 6.0.0 to having only a single main type for the
formal variable until we understand MMD a bit better.]

=head2 Parameter types

Parameters may be given types, just like any other variable:

    sub max (int @array is rw) {...}
    sub max (@array of int is rw) {...}

=head2 Generic types

Within a declaration, a class variable (either by itself or
following an existing type name) declares a new type name and takes
its parametric value from the actual type of the parameter it is
associated with.  It declares the new type name in the same scope
as the associated declaration.

    sub max (Num ::X @array) {
        push @array, X.new();

The new type name is introduced immediately, so two such types in
the same signature must unify compatibly if they have the same name:

    sub compare (Any ::T $x, T $y) {
        return $x eqv $y;

=head2 Return types

On a scoped subroutine, a return type can be specified before or after
the name.  We call all return types "return types", but distinguish
two kinds of return types, the C<as> type and the C<of> type,
because the C<of> type is normally an "official" named type and
declares the official interface to the routine, while the C<as>
type is merely a constraint on what may be returned by the routine
from the routine's point of view.

    our sub lay as Egg {...}            # as type
    our Egg sub lay {...}               # of type
    our sub lay of Egg {...}            # of type
    our sub lay (--> Egg) {...}         # of type

    my sub hat as Rabbit {...}          # as type
    my Rabbit sub hat {...}             # of type
    my sub hat of Rabbit {...}          # of type
    my sub hat (--> Rabbit) {...}       # of type

If a subroutine is not explicitly scoped, it defaults to C<my> scoping.
Any return type must go after the name:

    sub lay as Egg {...}                # as type
    sub lay of Egg {...}                # of type
    sub lay (--> Egg) {...}             # of type

On an anonymous subroutine, any return type can only go after the C<sub>

    $lay = sub as Egg {...};            # as type
    $lay = sub of Egg {...};            # of type
    $lay = sub (--> Egg) {...};         # of type

but you can use the C<anon> scope declarator to introduce an C<of> prefix type:

    $lay = anon Egg sub {...};            # of type
    $hat = anon Rabbit sub {...};         # of type

The return type may also be specified after a C<< --> >> token within
the signature.  This doesn't mean exactly the same thing as C<as>.
The C<of> type is the "official" return type, and may therefore be
used to do type inferencing outside the sub.  The C<as> type only
makes the return type available to the internals of the sub so that
the C<return> statement can know its context, but outside the sub we
don't know anything about the return value, as if no return type had
been declared.  The prefix form specifies the C<of> type rather than
the C<as> type, so the return type of

    my Fish sub wanda ($x) { ... }

is known to return an object of type Fish, as if you'd said:

    my sub wanda ($x --> Fish) { ... }

I<not> as if you'd said

    my sub wanda ($x) as Fish { ... }

It is possible for the C<of> type to disagree with the C<as> type:

    my Squid sub wanda ($x) as Fish { ... }

or equivalently,

    my sub wanda ($x --> Squid) as Fish { ... }

This is not lying to yourself--it's lying to the world.  Having a
different inner type is useful if you wish to hold your routine to
a stricter standard than you let on to the outside world, for instance.

=head1 The Cool class (and package)

The C<Cool> type is derived from C<Any>, and contains all the methods
that are "cool" (as in, "I'm cool with an argument of that type.").

More specifically, these are the methods that are culturally universal,
insofar as the typical user will expect the name of the method to imply
conversion to a particular built-in type that understands the method in
question.  For instance, C<$x.abs> implies conversion to an appropriate
numeric type if C<$x> is "cool" but doesn't already support a method
of that name.  Conversely, C<$x.substr> implies conversion to a string
or buffer type.

The C<Cool> module also contains all multisubs of last resort;
these are automatically searched if normal multiple dispatch does not
find a viable candidate.  Note that the C<Cool> package is mutable,
and both single and multiple dispatch must take into account changes
there for the purposes of run-time monkey patching.  However, since
the multiple dispatcher uses the C<Cool> package only as a failover,
compile-time analysis of such dispatches is largely unaffected for any
arguments with an exact or close match.  Likewise any single dispatch
a method that is more specific than the C<Cool> class is not affected
by the mutability of C<Cool>.  User-defined classes don't derive from
C<Cool> by default, so such classes are also unaffected by changes
to C<Cool>.

=head1 Names and Variables

=over 4

=item *

The C<$Package'var> syntax is gone.  Use C<$Package::var> instead.
(Note, however, that identifiers may now contain an apostrophe or
hyphen if followed by an "idfirst" letter.)

=item *

Perl 6 includes a system of B<sigils> to mark the fundamental
structural type of a variable:

    $   scalar (object)
    @   ordered array
    %   unordered hash (associative array)
    &   code/rule/token/regex
    ::  package/module/class/role/subset/enum/type/grammar

Within a declaration, the C<&> sigil also declares the visibility of the
subroutine name without the sigil within the scope of the declaration:

    my &func := sub { say "Hi" };
    func;   # calls &func

Within a signature or other declaration, the C<::> pseudo-sigil followed by an
identifier marks a type variable that also declares the visibility
of a package/type name without the sigil within the scope of the
declaration.  The first such declaration within a scope is assumed
to be an unbound type, and takes the actual type of its associated
argument.  With subsequent declarations in the same scope the use of
the pseudo-sigil is optional, since the bare type name is also declared.

A declaration nested within must not use the sigil if it wishes to
refer to the same type, since the inner declaration would rebind
the type.  (Note that the signature of a pointy block counts as part
of the inner block, not the outer block.)

=item *

Sigils indicate overall interface, not the exact type of the bound
object.  Different sigils imply different minimal abilities.

C<$x> may be bound to any object, including any object that can be
bound to any other sigil.  Such a scalar variable is always treated as
a singular item in any kind of list context, regardless of whether the
object is essentially composite or unitary.  It will not automatically
dereference to its contents unless placed explicitly in some kind of
dereferencing context.  In particular, when interpolating into list
context, C<$x> never expands its object to anything other than the
object itself as a single item, even if the object is a container
object containing multiple items.

C<@x> may be bound to an object of the C<Array> class, but it may also
be bound to any object that does the C<Positional> role, such as a
C<Seq>, C<Range>, C<Buf>, C<Parcel>, or C<Capture>.  The C<Positional>
role implies the ability to support C<< postcircumfix:<[ ]> >>.

Likewise, C<%x> may be bound to any object that does the C<Associative>
role, such as C<Pair>, C<PairSet>, C<Set>, C<Bag>, C<KeyHash>, or
C<Capture>.  The C<Associative> role implies the ability to support
C<< postcircumfix:<{ }> >>.

C<&x> may be bound to any object that does the C<Callable> role, such
as any C<Block> or C<Routine>.  The C<Callable> role implies the ability
to support C<< postcircumfix:<( )> >>.

C<::x> may be bound to any object that does the C<Abstraction> role,
such as a package, module, class, role, grammar, or any other
type object, or any immutable value object that can be used as a type.
This C<Abstraction> role implies the
ability to do various symbol table and/or typological manipulations which
may or may not be supported by any given abstraction.  Mostly though it
just means that you want to give some abstraction an official name that
you can then use later in the compilation without any sigil.

In any case, the minimal container role implied by the sigil is
checked at binding time at the latest, and may fail earlier (such
as at compile time) if a semantic error can be detected sooner.
If you wish to bind an object that doesn't yet do the appropriate
role, you must either stick with the generic C<$> sigil, or mix in
the appropriate role before binding to a more specific sigil.

An object is allowed to support both C<Positional> and C<Associative>.
An object that does not support C<Positional> may not be bound directly
to C<@x>.  However, any construct such as C<%x> that can interpolate
the contents of such an object into list context can automatically
construct a list value that may then be bound to an array variable.
Subscripting such a list does not imply subscripting back into the
original object.

=item *

Unlike in Perl 5, you may no longer put whitespace between a sigil
and its following name or construct.

=item *

Ordinary sigils indicate normally scoped variables, either lexical
or package scoped.  Oddly scoped variables include a secondary sigil
(a B<twigil>) that indicates what kind of strange scoping the variable
is subject to:

    $foo        ordinary scoping
    $.foo       object attribute public accessor
    $^foo       self-declared formal positional parameter
    $:foo       self-declared formal named parameter
    $*foo       dynamically overridable global variable
    $?foo       compiler hint variable
    $=foo       Pod variable
    $<foo>      match variable, short for $/{'foo'}
    $!foo       object attribute private storage
    $~foo       the foo sublanguage seen by the parser at this lexical spot

Most variables with twigils are implicitly declared or assumed to
be declared in some other scope, and don't need a "my" or "our".
Attribute variables are declared with C<has>, though.

=item *

Normal names and variables are declared using a I<scope declarator>:

    my          # introduces lexically scoped names
    our         # introduces package-scoped names
    has         # introduces attribute names
    anon        # introduces names that aren't to be stored anywhere
    state       # introduces lexically scoped but persistent names
    augment     # adds definitions to an existing name
    supersede   # replaces definitions of an existing name

Names may also be declared in the signature of a function.  These are
equivalent to a C<my> declaration inside the block of the function,
except that such parameters default to readonly.

The C<anon> declarator allows a declaration to provide a name that
can be used in error messages, but that doesn't put into any symbol table:

    my $secret = anon sub marine () {...}
    $secret(42)  # too many arguments to sub marine

=item *

Sigils are now invariant.  C<$> always means a scalar variable, C<@>
an array variable, and C<%> a hash variable, even when subscripting.
In item context, variables such as C<@array> and C<%hash> simply
return themselves as C<Array> and C<Hash> objects. (Item context was
formerly known as scalar context, but we now reserve the "scalar"
notion for talking about variables rather than contexts, much as
arrays are disassociated from list context.)

=item *

In string contexts, container objects automatically stringify to
appropriate (white-space separated) string values.  In numeric
contexts, the number of elements in the container is returned.
In boolean contexts, a true value is returned if and only if there
are any elements in the container.

=item *

To get a Perlish representation of any object, use the C<.perl> method.
Like the C<Data::Dumper> module in Perl 5, the C<.perl> method will put
quotes around strings, square brackets around list values, curlies around
hash values, constructors around objects, etc., so that Perl can evaluate
the result back to the same object.  The C<.perl> method will return
a representation of the object on the assumption that, if the code is
reparsed at some point, it will be used to regenerate the object as a
scalar in item context.  If you wish to interpolate the regenerated
object in a list context, it may be necessary to use C<<prefix:<|> >>
to force interpolation.

Note that C<.perl> has a very specific definition, and it is expected that
some modules will rely on the ability to roundtrip values with C<eval>.  As
such, overriding C<.perl> with a different format (globally using
C<MONKEY_TYPING>, or for specific classes unless special care is taken to
maintain parsability) is unwise.  Code which does not depend on C<.perl>'s
definition should use C<.pretty> instead to allow more control.

=item *

C<.pretty>, by contrast with C<.perl>, returns a flexible form of an object
intended for human interpretation.  Specific user classes are encouraged to
override C<.pretty> to do something appropriate, and it is completely
acceptable to monkey patch C<.pretty> methods while doing debugging, without
risk of breaking any used module.  C<.pretty>, like any method, will accept
and ignore unrecognized named arguments; implementations of C<.pretty> are
encouraged to standardize on a set of flags.

[Some conjectural suggestions:

    :oneline        Do not indent or linebreak output
    :width($d)      Wrap output at $d chars
    :charset($obj)  Represent unrecognized characters as escapes
    :ascii          Short for some instantiation of :charset

Conjecturally, C<.pretty> on system-defined classes could redispatch to
C<&*PRETTYPRINTER> or some similar system, allowing for a more disciplined
way to change pretty formats.

It may also be desirable to use a richer format for intermediate strings than
simple C<Str>, for instance using an object format that can handle intelligent
line breaking.  However, that's probably overkill.]

=item *

To get a formatted representation of any scalar value, use the
C<.fmt('%03d')> method to do an implicit C<sprintf> on the value.

To format an array value separated by commas, supply a second argument:
C<.fmt('%03d', ', ')>.  To format a hash value or list of pairs, include
formats for both key and value in the first string: C<< .fmt('%s: %s', "\n") >>.

=item *

Subscripts now consistently dereference the container produced by
whatever was to their left.  Whitespace is not allowed between a
variable name and its subscript.  However, there are two ways to
stretch the construct out visually.  Since a subscript is a kind
of postfix operator, there is a corresponding B<dot> form of each
subscript (C<@foo.[1]> and C<%bar.{'a'}>) that makes the dereference
a little more explicit. Constant string subscripts may be placed
in angles, so C<%bar.{'a'}> may also be written as C<< %bar<a> >>
or C<< %bar.<a> >>.  Additionally, you may insert extra whitespace
using the unspace.

=item *

Slicing is specified by the nature of the subscript, not by
the sigil.

=item *

The context in which a subscript is evaluated is no longer controlled
by the sigil either.  Subscripts are always evaluated in list context.
(More specifically, they are evaluated in a variant of list context
known as I<lol> context (List of List), which preserves dimensional information
so that you can do multi-dimensional slices using semicolons.  However,
each slice dimension evaluates its sublist in normal list context,
so functions called as part of a subscript don't see a lol context.
See S09 for more on slicing.)

If you need to force inner context to item (scalar), we now have convenient
single-character context specifiers such as + for numbers and ~ for strings:

    $x        =  g();       # item context for g()
    @x[f()]   =  g();       # list context for f() and g()
    @x[f()]   = +g();       # list context for f(), numeric item context for g()
    @x[+f()]  =  g();       # numeric item context for f(), list context for g()

    @x[f()]   =  @y[g()];   # list context for f() and g()
    @x[f()]   = +@y[g()];   # list context for f() and g()
    @x[+f()]  =  @y[g()];   # numeric item context for f(), list context for g()
    @x[f()]   =  @y[+g()];  # list context for f(), numeric item context for g()

    %x{~f()}  =  %y{g()};   # string item context for f(), list context for g()
    %x{f()}   =  %y{~g()};  # list context for f(), string item context for g()

Sigils used either as functions or as list prefix operators also
force context, so these also work:

    @x[$(g())]         # item context for g()
    %x{$(g())}         # item context for g()

But note that these don't do the same thing:

    @x[$g()]           # call function in $g
    %x{$g()}           # call function in $g

=item *

There is a need to distinguish list assignment from list binding.
List assignment works much like it does in Perl 5, copying the
values.  There's a new C<:=> binding operator that lets you bind
names to C<Array> and C<Hash> objects without copying, in the same way
as subroutine arguments are bound to formal parameters.  See S06
for more about binding.

=item *

A list of one or more comma-separated objects may be grouped together
by parentheses into a "parenthesis cell", or C<Parcel>.  This kind of
list should not be confused with the flattening list context.  Instead,
this is a raw syntactic list that has not yet committed to flattening;
no interpretation is made of the list inside without knowing what
context it will be evaluated in.  For example, when you say:


the result is a C<Parcel> object containing three C<Int> objects
and a C<Pair> object, that is, four positional objects.  When, however,
you say something like:


the syntactic C<Parcel> is translated (at compile time, in this case)
into a C<Capture> object with three positionals and one named argument
in preparation for binding.  More generally, a parcel is transmuted
to a capture any time it is bound to a complete signature.

You may force immediate conversion to a C<Capture> object by prefixing
the parcel composer with a backslash:

    $args = \(1,2,3,:mice<blind>)

Unlike C<Capture> objects, C<Parcel> objects are ephemeral, insofar as the
user almost never sees one as a real standalone object, since binding or
assignment always turns a parcel into something else.  A parcel may generally
only be preserved as a part of an outer parcel or capture object.

Individual arguments in a parcel or capture composer are parsed as ordinary
expressions, and any functions mentioned are called immediately, with
each function's results placed as an argument (often a subparcel, if the
function returns multiple values) within the outer parcel (or capture).
Whether any given argument is flattened will depend on its eventual binding,
and in general cannot be known at parcel/capture composition time.

We use "argument" here to mean anything that would be taken as a single
argument if bound to a positional or named parameter:

    rhyme(1,2,3,:mice<blind>)     # rhyme has 4 arguments
    rhyme((1,2),3,:mice<blind>)   # rhyme has 3 arguments
    rhyme((1,2,3),:mice<blind>)   # rhyme has 2 arguments
    rhyme((1,2),(3,:mice<blind>)) # rhyme has 2 arguments
    rhyme((1,2,3,:mice<blind>))   # rhyme has 1 argument

In these examples, the first argument to the function is
a parcel in all but the first case, where it is simply the
literal integer 1.  An argument is either of:


=item *

A parcel that groups together a sublist, or

=item *

Any other object that can function as a single argument.


Looking at it the other way, all arguments that don't actually need to be
wrapped up in a parcel are considered degenerate parcels in their
own right when it comes to binding.  Note that a capture is not
considered a kind of parcel, so does not flatten in flat context.

=item *

When a C<Parcel> is bound to a parameter, the behavior depends on whether
the parameter is "flattening" or "argumentative".  Positional parameters
and slice parameters are argumentative and call C<.getarg> on the internal
iterator and just return the next syntactic argument (parcel or other object)
without flattening.  (A slice differs from an ordinary positional parameter
in being "slurpy", that is, it is intended to fetch multiple values from
the variadic region of the surrounding capture.  Slurpy contexts come in
both flattening (C<*> parameters) and slicing (C<**> parameters) forms.)

The fact that a parameter is being bound implies that there is an outer
capture being bound to a signature.  The capture's iterator provides
a C<.get> and a C<.getarg> method to tell the iterator what context to
bind in.  For positional/slice parameters, the C<.getarg> method returns
the entire next argument from the iterator, but transmutes
any outer C<Parcel> to a C<Seq> object; it returns other objects unchanged.
In contrast, flat parameters call C<.get> on the capture's iterator, which
flattens any subparcels before pulling out the next item.  In either case,
no bare parcel object is seen as a normal bound argument.  (There is a way to
bind the underlying parcel using backslash, however.  This is how internal
routines can deal with parcels as real objects.)

In contrast to parameter binding, if a C<Parcel> is bound to an entire
signature (typically as part of a function or method call), it will be transformed
first into a capture object, which is much like a parcel but has its
arguments divvied up into positional and named subsets for faster
binding.  (Usually this transformation happens at compile time.)
If the first positional is followed by a colon instead of a comma,
it is marked as the invocant in case it finds itself in a context
that cares.  It's illegal to use the colon in place of the comma
anywhere except after the first argument.

Explicit binding to an individual variable is considered a form of signature
binding, which is to say a declarator puts implicit signature parens
around the unparenthesized form:

    my (*@x) := foo(); # signature binding
    my *@x := foo();   # same thing

The parens are, of course, required if there is more than one parameter.

C<Capture> objects are immutable in the abstract, but evaluate their
arguments lazily.  Before everything inside a C<Capture> is fully evaluated
(which happens at compile time when all the arguments are constants), the
eventual value may well be unknown.  All we know is that we have the promise
to make the bits of it immutable as they become known.

C<Capture> objects may contain multiple unresolved iterators such as feeds
or parcels or lists of parcels.  How these are resolved depends on what they are eventually
bound to.  Some bindings are sensitive to multiple dimensions while
others are not.  Binding to a list of lists is often known as "slicing",
because it's commonly used to index "slices" of a potentially multi-dimensional array.

You may retrieve parts from a C<Capture> object with a prefix sigil operator:

    $args = \3;     # same as "$args = \(3)"
    @$args;         # same as "Array($args)"
    %$args;         # same as "Hash($args)"

When cast into an array, you can access all the positional arguments; into a
hash, all named arguments.

All prefix sigil operators accept one positional argument, evaluated in
item context as a rvalue.  They can interpolate in strings if called with
parentheses.  The special syntax form C<$()> translates into C<$( $.ast // Str($/) )>
to operate on the current match object; similarly C<@()> and C<%()> can
extract positional and named submatches.

C<Parcel> and C<Capture> objects fill the ecological niche of references in Perl 6.
You can think of them as "fat" references, that is, references that
can capture not only the current identity of a single object, but
also the relative identities of several related objects.  Conversely,
you can think of Perl 5 references as a degenerate form of C<Capture>
when you want to refer only to a single item.

There is a special C<Parcel> value named C<Nil>.  It means "there
is no value here".  It is the undefined equivalent of the empty
C<()> list, except that the latter is defined and means "there are
0 arguments here".  The C<Nil> value returns itself if you iterate
it or try to get a positional value from it via subscripting, but
interpolates as a null list into flat context, and an empty C<Seq>
into a tree context.  In either case, a warning is issued.

Since method calls are performed directly on any object, C<Nil>
can respond to certain method calls.  C<Nil.defined> returns
C<False> (whereas C<().defined> returns C<True>).  C<Nil.so> also
returns C<False>.  C<Nil.ACCEPTS> matches only a C<Nil> value.  C<Nil.perl> and
C<Nil.Str> return C<"Nil">.  C<Nil.Stringy> returns '' with a warning.
C<Nil.Numeric> returns 0 with a warning.  Any undefined method call
on C<Nil> returns C<Nil>, so that C<Nil> propagates down method
call chains.

Assigning C<Nil> to any scalar container causes the
container to throw out any contents and restore itself to an
uninitialized state (after which it will contain a type object
appropriate to the declared type of the container, where C<Any>
is the default type).  Binding of C<Nil> has a similar result, except that binding
C<Nil> to a parameter with a default causes that parameter to be set to its
default value rather than an undefined value, as if the argument had not
been supplied.

Assigning or binding C<Nil> to any composite container (such as an
C<Array> or C<Hash>) empties the container, resetting it back to an
uninitialized state.  The container object itself then becomes undefined.
(Asssignment of C<()> leaves it defined.)

The C<sink> statement prefix will eagerly evaluate any block or
statement, throw away the results, and instead return the C<Nil> value.
This can be useful to peg some behavior to an empty list while still
returning an empty list:

    # Check that incoming argument list isn't null
    @inclist = map { $_ + 1 }, @list || sink warn 'Nil input!';

    @inclist = do for @list || sink { warn 'Nil input!'; $warnings++; } {
        $_ + 1;

    # Check that outgoing result list isn't null
    @inclist = do map { $_ + 1 }, @list or sink warn 'Nil result!';

    @inclist = do for @list {
        $_ + 1;
    } or sink { warn 'Nil result'; $warnings++; }

Given C<sink>, there's no need for an "else" clause on Perl 6's loops,
and the C<sink> construct works in any list, not just C<for> loops.

=item *

A C<CaptureCursor> object is a view into another capture with an associated
start position.  Such a cursor is essentially a pattern-matching state.
Capture cursors are used for operations like C<grep> and C<map> and C<for>
loops that need to apply a short signature multiple times to a longer list
of values supplied by the base capture.  When we say "capture" we sometimes
mean either C<Capture> or C<CaptureCursor>.  C<CaptureCursors> are also
immutable.  When pattern matching a signature against a cursor, you
get a new cursor back which tells you the new position in the base capture.

=item *

A signature object (C<Signature>) may be created with colon-prefixed parens:

    my ::MySig ::= :(Int, Num, Complex, Status)

Expressions inside the signature are parsed as parameter declarations
rather than ordinary expressions.  See S06 for more details on the syntax
for parameters.

Declarators generally make the colon optional:

    my ($a,$b,$c);      # parsed as signature

Signature objects bound to type variables (as in the example above) may
be used within other signatures to apply additional type constraints.
When applied to a capture argument, the signature allows you to
take the types of the capture's arguments from C<MySig>, but declare
the (untyped) variable names yourself via an additional signature
in parentheses:

    sub foo (Num Dog|Squirrel $numdog, MySig $a ($i,$j,$k,$mousestatus)) {...}
    foo($mynumdog, \(1, 2.7182818, 1.0i, statmouse());

=item *

Unlike in Perl 5, the notation C<&foo> merely stands for the C<foo>
function as a C<Routine> object without calling it.  You may call any Code
object by dereferencing it with parens (which may, of course, contain arguments):

    &foo($arg1, $arg2);

Whitespace is not allowed before the parens because it is parsed as
a postfix.  As with any postfix, there is also a corresponding C<.()>
operator, and you may use the "unspace" form to insert optional
whitespace and comments between the backslash and either of the
postfix forms:

    &foo\   ($arg1, $arg2);
    &foo\   .($arg1, $arg2);
        embedded comment
    ].($arg1, $arg2);

Note however that the parentheses around arguments in the "normal"
named forms of function and method calls are not postfix operators, so do
not allow the C<.()> form, because the dot is indicative of an actual
dereferencing operation, which the named forms aren't doing.  You
may, however, use "unspace" to install extra space before the parens
in the forms:

    foo()       # okay
    foo\ ()     # okay
    foo.()      # means foo().()

    .foo()      # okay
    .foo\ ()    # okay
    .foo.()     # means .foo().()

    $.foo()     # okay
    $.foo\ ()   # okay
    $.foo.()    # means $.foo().()

If you I<do> use the dotty form on these special forms, it will
assume you wanted to call the named form without arguments, and
then dereference the result of that.

=item *

With multiple dispatch, C<&foo> is actually the name of a C<dispatch>
routine (instantiated from a C<proto>) controlling a set of candidate
functions (which you can use as if it were an ordinary function,
because a C<dispatch> is really an C<only> function with pretentions
to management of a dispatcher).  However, in that case C<&foo>
by itself is not sufficient to uniquely name a specific function.
To do that, the type may be refined by using a signature literal as
a postfix operator:


Use of a signature that does not unambiguously select a single multi results in

It still just returns a C<Routine> object.  A call may also be partially
applied by using the C<.assuming> method:


=item *

Slicing syntax is covered in S09.  A multidimensional
slice will be done with semicolons between individual slice sublists.
The semicolons imply one extra level of tree-ness, where the top
list is of type C<LoL> and sublists are C<Lists>s (or non-iterable
items that can function as single-item parcels).  So
when you say

    @matrix[1..*; 0]

really means

    @matrix[LoL( (1..*), 0 )]

Each such slice sub-parcel is evaluated lazily.

=item *

To make a slice subscript return something other than values, append an
appropriate adverb to the subscript.

    @array = <A B>;
    @array[0,1,2];      # returns 'A', 'B', Nil
    @array[0,1,2] :p;   # returns 0 => 'A', 1 => 'B'
    @array[0,1,2] :kv;  # returns 0, 'A', 1, 'B'
    @array[0,1,2] :k;   # returns 0, 1
    @array[0,1,2] :v;   # returns 'A', 'B'

    %hash = (:a<A>, :b<B>);
    %hash<a b c>;       # returns 'A', 'B', Nil
    %hash<a b c> :p;    # returns a => 'A', b => 'B'
    %hash<a b c> :kv;   # returns 'a', 'A', 'b', 'B'
    %hash<a b c> :k;    # returns 'a', 'b'
    %hash<a b c> :v;    # returns 'A', 'B'

These adverbial forms all weed out non-existing entries.  You may also
perform an existence test, which will return true if all the elements
of the slice exist:

    if %hash<a b c> :exists {...}


    my ($a,$b,$c) = %hash<a b c> :delete;

deletes the entries "en passant" while returning them.  (Of course,
any of these forms also work in the degenerate case of a slice
containing a single index.)  Note that these forms work by virtue
of the fact that the subscript is the topmost previous operator.
You may have to parenthesize or force list context if some other
operator that is tighter than comma would appear to be topmost:

    1 + (%hash{$x} :delete);
    $x = (%hash{$x} :delete);
    ($x) = %hash{$x} :delete;

(The situation does not often arise for the slice modifiers above
because they are usually used in list context, which operates
at comma precedence.)

=item *

In numeric context (i.e. when cast into C<Int> or C<Num>), a C<Hash> object
becomes the number of pairs contained in the hash.  In a boolean context, a
Hash object is true if there are any pairs in the hash.  In either case,
any intrinsic iterator would be reset.  (If hashes do carry an intrinsic
iterator (as they do in Perl 5), there will be a C<.reset> method on the
hash object to reset the iterator explicitly.)

=item *

Sorting a list of pairs should sort on their keys by default, then
on their values.  Sorting a list of lists should sort on the first
elements, then the second elements, etc.  For more on C<sort> see S29.

=item *

Many of the special variables of Perl 5 are going away.  Those that
apply to some object such as a filehandle will instead be attributes
of the appropriate object.  Those that are truly global will have
global alphabetic names, such as C<$*PID> or C<@*ARGS>.

=item *

Any remaining special variables will be lexically scoped.
This includes C<$_> and C<@_>, as well as the new C<$/>, which
is the return value of the last regex match.  C<$0>, C<$1>, C<$2>, etc.,
are aliases into the C<$/> object.

=item *

The C<$#foo> notation is dead.  Use C<@foo.end> or C<@foo[*-1]> instead.
(Or C<@foo.shape[$dimension]> for multidimensional arrays.)


=head1 Names

=over 4

=item *

An I<identifier> is composed of an alphabetic character followed by
any sequence of alphanumeric characters.  The definitions of alphabetic
and numeric include appropriate Unicode characters.  Underscore is
always considered alphabetic.  An identifier may also contain isolated
apostrophes or hyphens provided the next character is alphabetic.

A I<name> is anything that is a legal part of a variable name (not counting
the sigil).  This includes

    $foo                # simple identifiers
    $Foo::Bar::baz      # compound identifiers separated by ::
    $Foo::($bar)::baz   # compound identifiers that perform interpolations
    $42                 # numeric names
    $!                  # certain punctuational variables

When not used as a sigil, the semantic function of C<::> within a
name is to force the preceding portion of the name to be considered
a package through which the subsequent portion of the name is to
be located.  If the preceding portion is null, it means the package
is unspecified and must be searched for according to the nature of
what follows.  Generally this means that an initial C<::> following the
main sigil is a no-op on names that are known at compile time, though
C<::()> can also be used to introduce an interpolation (see below).
Also, in the absence of another sigil, C<::> can serve as its own
sigil indicating intentional use of a not-yet-declared package name.

Unlike in Perl 5, if a sigil is followed by comma, semicolon, a colon
not followed by an identifier,
or any kind of bracket or whitespace (including Unicode brackets and
whitespace), it will be taken to be a sigil without a name rather
than a punctuational variable.  This allows you to use sigils as coercion

    print $( foo() )    # foo called in item context
    print %( foo() )   # foo called in hash context

In declarative constructs bare sigils may be used as placeholders for
anonymous variables:

    my ($a, $, $c) = 1..3;
    print unless (state $)++;

Outside of declarative constructs you may use C<*> for a placeholder:

    ($a, *, $c) = 1..3;

Attempts to say something like:

    ($a, $, $c) = 1..3;

will result in the message, "Anonymous variable requires declarator".

=item *

Ordinary package-qualified names look like in Perl 5:

    $Foo::Bar::baz      # the $baz variable in package Foo::Bar

Sometimes it's clearer to keep the sigil with the variable name, so an
alternate way to write this is:


This is resolved at compile time because the variable name is a constant.

=item *

The following pseudo-package names are reserved at the front of a name:

    MY          # Symbols in the current lexical scope (aka $?SCOPE)
    OUR         # Symbols in the current package (aka $?PACKAGE)
    CORE        # Outermost lexical scope, definition of standard Perl
    GLOBAL      # Interpreter-wide package symbols, really UNIT::GLOBAL
    PROCESS     # Process-related globals (superglobals)
    COMPILING   # Lexical symbols in the scope being compiled
    DYNAMIC     # Contextual symbols in my or any caller's lexical scope

The following relative names are also reserved but may be used
anywhere in a name:

    CALLER      # Contextual symbols in the immediate caller's lexical scope
    OUTER       # Symbols in the next outer lexical scope
    UNIT        # Symbols in the outermost lexical scope of compilation unit
    SETTING     # Lexical symbols in the unit's DSL (usually CORE)
    PARENT      # Symbols in this package's parent package (or lexical scope)

The following is reserved at the beginning of method names in method calls:

    SUPER       # Package symbols declared in inherited classes

Other all-caps names are semi-reserved.  We may add more of them in
the future, so you can protect yourself from future collisions by using
mixed case on your top-level packages.  (We promise not to break
any existing top-level CPAN package, of course.  Except maybe C<ACME>,
and then only for coyotes.)

The file's scope is known as C<UNIT>, but there are one or more
lexical scopes outside of that corresponding to the linguistic setting
(often known as the prelude in other cultures).  Hence, the C<SETTING>
scope is equivalent to C<UNIT::OUTER>.  For a standard Perl program
C<SETTING> is the same as C<CORE>, but various startup options (such
as C<-n> or C<-p>) can put you into a domain specific language,
in which case C<CORE> remains the scope of the standard language,
while C<SETTING> represents the scope defining the DSL that functions
as the setting of the current file.  See also the C<-L>/C<--language>
switch described in L<S19-commandline>.  If a setting wishes
to gain control of the main execution, it merely needs to declare
a C<MAIN> routine as documented in S06.  In this case the ordinary
execution of the user's code is suppressed; instead, execution
of the user's code is entirely delegated to the setting's C<MAIN> routine,
which calls back to the user's lexically embedded code with C<{YOU_ARE_HERE}>.

The C<{YOU_ARE_HERE}> functions within the setting as a proxy for
the user's C<UNIT> block, so C<-n> and C<-p> may be implemented in
a setting with:

    for $*ARGFILES.lines {YOU_ARE_HERE}                 # -n
    map *.say, do for $*ARGFILES.lines {YOU_ARE_HERE}   # -p


    map {YOU_ARE_HERE}, $*ARGFILES.lines;               # -n
    map *.say, map {YOU_ARE_HERE}, $*ARGFILES.lines;    # -p

and the user may use loop control phasers as if they were directly in
the loop block.  Any C<OUTER> in the user's code refers to the block
outside of C<{YOU_ARE_HERE}>.  If used as a standalone statement,
C<{YOU_ARE_HERE}> runs as if it were a bare block.

Note that, since the C<UNIT> of an eval is the eval string itself,
the C<SETTING> of an eval is the language in effect at the point
of the eval, not the language in effect at the top of the file.
(You may, however, use C<OUTER::SETTING> to get the setting of the
code that is executing the eval.)  In more traditional terms, the
normal program is functioning as the "prelude" of the eval.

So the outermost lexical scopes nest like this, traversed via C<OUTER>:

    CORE <= SETTING < UNIT < (your_block_here)

The outermost package scopes nest like this, traversed via C<PARENT>:

    GLOBAL <  (your_package_here)

You main program starts up in the C<GLOBAL> package and the C<UNIT>
lexical scope.  Whenever anything is declared with "our" semantics, it
inserts a name into both the current package and the current lexical
scope.  (And "my" semantics only insert into the current lexical
scope.)  Note that the standard setting, C<CORE>, is a lexical scope,
not a package; the various items that are defined within (or imported
into) C<CORE> are *not* in C<GLOBAL>, which is pretty much empty when
your program starts compiling, and mostly only contains things you
either put there yourself, or some other module put there because
you used that module.  In general things defined within (or imported
into) C<CORE> should only be declared or imported with "my" semantics.
All Perl code can see C<CORE> anyway as the outermost lexical scope,
so there's no need to also put such things into C<GLOBAL>.

The C<GLOBAL> package itself is accessible via C<UNIT::GLOBAL>.
The C<PROCESS> package is accessible via C<UNIT::PROCESS>.
The C<PROCESS> package is not the parent of C<GLOBAL>.  However, searching
up the dynamic stack for dynamic variables will look in all nested
dynamic scopes (mapped automatically to each call's lexical scope,
not package scope) out to the main dynamic scope; once all the dynamic scopes are
exhausted, it also looks in the C<GLOBAL> package and then in the
C<PROCESS> package, so C<$*OUT> typically finds the process's standard
output handle.  Hence, C<PROCESS> and C<GLOBAL> serve as extra outer
dynamic scopes, much like C<CORE> and C<SETTING> function as extra outer
lexical scopes.

Extra C<SETTING> scopes keep their identity and their nesting within C<CORE>,
so you may have to go to C<OUTER> several times from C<UNIT> before you get
to C<CORE>.  Normally, however, there is only the core setting, in which
case C<UNIT::OUTER> ends up meaning the same as C<SETTING> which is the same
as C<CORE>.

Extra C<GLOBAL> scopes are treated differently.  Every compilation unit has
its own associated C<UNIT::GLOBAL> package.  As the currently compiling
compilation unit expresses the need for various other compilation units,
the global names known to those other units must be merged into the new
unit's C<UNIT::GLOBAL>.  (This includes the names in all the packages
within the global package.)  If two different units use the same global
name, they must generally be taken to refer to the same item, but only if
the type signatures can be meshed (and augmentation rules followed, in the
case of package names).  If two units provide package names with
incompatible type signatures, the compilation of the unit fails.  In other
words, you may not use incompatible global types to provide a union type.
However, if one or the other unit underspecifies the type in a compatible
way, the underspecified type just takes on the extra type information as it
learns it.  (Presumably some combination of Liskov substitution, duck-typing,
and run-time checking will prevent tragedy in the unit that was compiled with
the underspecified type.  Alternately, the compiler is allowed to recompile or
re-examine the unit with the new type constraints to see if any issues are
certain to arise at run time, in which case the compiler is free to complain.)

Any dynamic variable declared with C<our> in the user's main program
(specifically, the part compiled with C<GLOBAL> as the current package)
is accessible (by virtue of being in C<GLOBAL>) as a dynamic variable
even if not directly in the dynamic call chain.  Note that dynamic
vars do *not* look in C<CORE> for anything.  (They I<might> look in
C<SETTING> if you're running under a setting distinct from C<CORE>,
if that setting defines a dynamic scope outside your main program,
such as for the C<-n> or C<-p> switch.)  Context variables declared
with C<our> in the C<GLOBAL> or C<PROCESS> packages do not need to
use the C<*> twigil, since the twigil is stripped before searching
those packages.  Hence, your environment variables are effectively
declared without the sigil:

    augment package GLOBAL { our %ENV; }

=item *

You may interpolate a string into a package or variable name using
C<::($expr)> where you'd ordinarily put a package or variable name.
The string is allowed to contain additional instances of C<::>, which
will be interpreted as package nesting.  You may only interpolate
entire names, since the construct starts with C<::>, and either ends
immediately or is continued with another C<::> outside the parens.
Most symbolic references are done with this notation:

    $foo = "Bar";
    $foobar = "Foo::Bar";
    $::($foo)           # lexically-scoped $Bar
    $::("MY::$foo")     # lexically-scoped $Bar
    $::("OUR::$foo")    # package-scoped $Bar
    $::("GLOBAL::$foo") # global $Bar
    $::("PROCESS::$foo")# process $Bar
    $::("PARENT::$foo") # current package's parent's $Bar
    $::($foobar)        # $Foo::Bar
    $::($foobar)::baz   # $Foo::Bar::baz
    $::($foo)::Bar::baz # $Bar::Bar::baz
    $::($foobar)baz     # ILLEGAL at compile time (no operator baz)

Note that unlike in Perl 5, initial C<::> doesn't imply global.
Here as part of the interpolation syntax it doesn't even imply package.
After the interpolation of the C<::()> component, the indirect name
is looked up exactly as if it had been there in the original source
code, with priority given first to leading pseudo-package names,
then to names in the lexical scope (searching scopes outwards, ending
at C<CORE>). The current package is searched last.

Use the C<MY> pseudopackage to limit the lookup to the current lexical
scope, and C<OUR> to limit the scopes to the current package scope.

=item *

When "strict" is in effect (which is the default except for one-liners),
non-qualified variables (such as C<$x> and C<@y>) are only looked up from
lexical scopes, but never from package scopes.

To bind package variables into a lexical scope, simply say C<our ($x, @y)>.
To bind global variables into a lexical scope, predeclare them with C<use>:

    use PROCESS <$IN $OUT>;

Or just refer to them as C<$*IN> and C<$*OUT>.

=item *

To do direct lookup in a package's symbol table without scanning, treat
the package name as a hash:

    Foo::Bar::{'&baz'}  # same as &Foo::Bar::baz
    PROCESS::<$IN>      # Same as $*IN
    Foo::<::Bar><::Baz> # same as Foo::Bar::Baz

The C<::> before the subscript is required here, because the
C<Foo::Bar{...}> syntax is reserved for attaching a "WHENCE"
initialization closure to an autovivifiable type object.  (see S12).

Unlike C<::()> symbolic references, this does not parse the argument
for C<::>, nor does it initiate a namespace scan from that initial
point.  In addition, for constant subscripts, it is guaranteed to
resolve the symbol at compile time.

The null pseudo-package is reserved to mean the same search list as an ordinary
name search.  That is, the following are all identical in meaning:


That is, each of them scans lexical scopes outward, and then the current package scope
(though the package scope is then disallowed when "strict" is in effect).

As a result of these rules, you can write any arbitrary variable name as either of:


You can also use the C<< ::<> >> form as long as there are no spaces in the name.

=item *

The current lexical symbol table is now accessible through the
pseudo-package C<MY>.  The current package symbol table is visible as
pseudo-package C<OUR>.  The C<OUTER> name refers to the C<MY> symbol table
immediately surrounding the current C<MY>, and C<OUTER::OUTER> is the one
surrounding that one.

    our $foo = 41;
    say $::foo;         # prints 41, :: is no-op
        my $foo = 42;
        say MY::<$foo>;         # prints "42"
        say $MY::foo;           # same thing
        say $::foo;             # same thing, :: is no-op here

        say OUR::<$foo>;        # prints "41"
        say $OUR::foo;          # same thing

        say OUTER::<$foo>;      # prints "41" (our $foo is also lexical)
        say $OUTER::foo;        # same thing

You may not use any lexically scoped symbol table, either by name or
by reference, to add symbols to a lexical scope that is done compiling.
(We reserve the right to relax this if it turns out to be useful though.)

=item *

The C<CALLER> package refers to the lexical scope of the (dynamically
scoped) caller.  The caller's lexical scope is allowed to hide any
user-defined variable from you.  In fact, that's the default, and a
lexical variable must have the trait "C<is dynamic>" to be
visible via C<CALLER>.  (C<$_>, C<$!> and C<$/> are always
dynamic, as are any variables whose declared names contain a C<*> twigil.)
If the variable is not visible in the caller, it returns
failure.  Variables whose names are visible at the point of the call but that
come from outside that lexical scope are controlled by the scope
in which they were originally declared as dynamic.
Hence the visibility of C<< CALLER::<$*foo> >> is determined where
C<$*foo> is actually declared, not by the caller's scope (unless that's where
it happens to be declared).  Likewise C<< CALLER::CALLER::<$x> >>
depends only on the declaration of C<$x> visible in your caller's caller.

User-defined dynamic variables should generally be initialized with
C<::=> unless it is necessary for variable to be modified.  (Marking
dynamic variables as readonly is very helpful in terms of sharing
the same value among competing threads, since a readonly variable
need not be locked.)

=item *

The C<DYNAMIC> pseudo-package is just like C<CALLER> except that
it starts in the current dynamic scope and from there scans outward
through all dynamic scopes (frames) until it finds a dynamic variable of that
name in that dynamic frame's associated lexical pad.  (This search
is implied for variables with the C<*> twigil; hence C<$*FOO> is
equivalent to C<< DYNAMIC::<$*FOO> >>.)  If, after scanning outward
through all those dynamic scopes, there is no variable of that name
in any immediately associated lexical pad, it strips the C<*> twigil
out of the name and looks in the C<GLOBAL> package followed by the
C<PROCESS> package.  If the value is not found, it returns failure.

Unlike C<CALLER>, C<DYNAMIC> will see a dynamic variable that is
declared in the current scope, since it starts search 0 scopes up the
stack rather than 1.  You may, however, use C<< CALLER::<$*foo> >>
to bypass a dynamic definition of C<$*foo> in your current scope,
such as to initialize it with the outer dynamic value:

    my $*foo ::= CALLER::<$*foo>;

The C<temp> declarator may be used (without an initializer) on a
dynamic variable to perform a similar operation:

    temp $*foo;

The main difference is that by default it initializes the new
C<$*foo> with its current value, rather than the caller's value.
Also, it is allowed only on read/write dynamic variables, since
the only reason to make a copy of the outer value would be
because you'd want to override it later and then forget the
changes at the end of the current dynamic scope.

You may also use C<< OUTER::<$*foo> >> to mean you want to start the
search in your outer lexical scope, but this will succeed only if
that outer lexical scope also happens to be be one of your current
I<dynamic> scopes.  That is, the same search is done as with the bare
C<$*foo>, but any "hits" are ignored until we've got to the C<OUTER>
scope in our traversal.

=item *

There is no longer any special package hash such as C<%Foo::>.  Just
subscript the package object itself as a hash object, the key of which
is the variable name, including any sigil.  The package object can
be derived from a type name by use of the C<::> postfix:


(Directly subscripting the type with either square brackets or curlies
is reserved for various generic type-theoretic operations.  In most other
matters type names and package names are interchangeable.)

Typeglobs are gone.  Use binding (C<:=> or C<::=>) to do aliasing.
Individual variable objects are still accessible through the
hash representing each symbol table, but you have to include the
sigil in the variable name now: C<MyPackage::{'$foo'}> or the
equivalent C<< MyPackage::<$foo> >>.

=item *

Interpreter globals live in the C<GLOBAL> package.  The user's program
starts in the C<GLOBAL> package, so "our" declarations in the mainline
code go into that package by default.  Process-wide variables live in
the C<PROCESS> package.  Most predefined globals such as C<$*UID>
and C<%*PID> are actually process globals.

=item *

There is only ever a single C<PROCESS> package.
For an ordinary Perl program running by itself, there is only one C<GLOBAL>
package as well.  However, in certain
situations (such as shared hosting under a webserver), the actual
process may contain multiple virtual processes or interpreters, each running its own
"main" code.  In this case, the C<GLOBAL> namespace holds variables
that properly belong to the individual virtual process, while the
C<PROCESS> namespace holds variables that properly belong to the actual
process as a whole.  From the viewpoint of the program
there is little difference as long as all global variables are accessed
as if they were dynamic variables (by using the C<*> twigil).
The process as a whole may place restrictions on the
mutability of process variables as seen by the individual subprocesses.
Also, individual subprocesses may not create new process variables.
If the process wishes to grant subprocesses the ability to communicate
via the C<PROCESS> namespace, it must supply a writeable dynamic variable
to all the subprocesses granted that privilege.

=item *

It is illegal to assign or bind a dynamic variable that does not already exist.
It will not be created in C<GLOBAL> (or C<PROCESS>) automatically, nor is it
created in any lexical scope.  Instead, you must assign directly using the
package name to get that to work:

    GLOBAL::<$mynewvar> = $val;

=item *

The magic command-line input handle is C<$*ARGFILES>.
The arguments themselves come in C<@*ARGS>.  See also "Declaring a MAIN
subroutine" in S06.

=item *

Magical file-scoped values live in variables with a C<=> secondary
sigil.  C<$=DATA> is the name of your C<DATA> filehandle, for instance.
All Pod structures are available through C<%=POD> (or some such).
As with C<*>, the C<=> may also be used as a package name: C<$=::DATA>.

=item *

Magical lexically scoped values live in variables with a C<?> secondary
sigil.  These are all values that are known to the compiler, and may
in fact be dynamically scoped within the compiler itself, and only
appear to be lexically scoped because dynamic scopes of the compiler
resolve to lexical scopes of the program.  All C<$?> variables are considered
constants, and may not be modified after being compiled in.  The user
is also allowed to define or (redefine) such constants:

    constant $?TABSTOP = 4;     # assume heredoc tabs mean 4 spaces

(Note that the constant declarator always evaluates its initialization
expression at compile time.)

C<$?FILE> and C<$?LINE> are your current file and line number, for
Instead of C<$?OUTER::FOO> you probably want to write C<< OUTER::<$?FOO> >>.
Within code that is being run during the compile, such as C<BEGIN> blocks, or
macro bodies, or constant initializers, the compiler variables must be referred
to as (for instance) C<< COMPILING::<$?LINE> >> if the bare C<$?LINE> would
be taken to be the value during the compilation of the currently running
code rather than the eventual code of the user's compilation unit.  For
instance, within a macro body C<$?LINE> is the line within the macro
body, but C<< COMPILING::<$?LINE> >> is the line where the macro was invoked.
See below for more about the C<COMPILING> pseudo package.

Here are some possibilities:

    $?FILE      Which file am I in?
    $?LINE      Which line am I at?
    &?ROUTINE   Which routine am I in?
    &?BLOCK     Which block am I in?
    %?LANG      What is the current set of interwoven languages?

The following return objects that contain all pertinent info:

    $?KERNEL    Which kernel am I compiled for?
    $?DISTRO    Which OS distribution am I compiling under
    $?VM        Which virtual machine am I compiling under
    $?XVM       Which virtual machine am I cross-compiling for
    $?PERL      Which Perl am I compiled for?
    $?SCOPE     Which lexical scope am I in?
    $?PACKAGE   Which package am I in?
    $?MODULE    Which module am I in?
    $?CLASS     Which class am I in? (as variable)
    $?ROLE      Which role am I in? (as variable)
    $?GRAMMAR   Which grammar am I in?

It is relatively easy to smartmatch these constant objects
against pairs to check various attributes such as name,
version, or authority:

    given $?VM {
        when :name<Parrot> :ver(v2) { ... }
        when :name<CLOS>            { ... }
        when :name<SpiderMonkey>    { ... }
        when :name<JVM> :ver(v6.*)  { ... }

Matches of constant pairs on constant objects may all be resolved at
compile time, so dead code can be eliminated by the optimizer.

Note that some of these things have parallels in the C<*> space at run time:

    $*KERNEL    Which kernel I'm running under
    $*DISTRO    Which OS distribution I'm running under
    $*VM        Which VM I'm running under
    $*PERL      Which Perl I'm running under

You should not assume that these will have the same value as their
compile-time cousins.

=item *

While C<$?> variables are constant to the run time, the compiler
has to have a way of changing these values at compile time without
getting confused about its own C<$?> variables (which were frozen in
when the compile-time code was itself compiled).  The compiler can
talk about these compiler-dynamic values using the C<COMPILING> pseudopackage.

References to C<COMPILING> variables are automatically hoisted into the
lexical scope currently being compiled.  Setting or temporizing a C<COMPILING>
variable sets or temporizes the incipient C<$?> variable in the
surrounding lexical scope that is being compiled.  If nothing in
the context is being compiled, an exception is thrown.

    $?FOO // say "undefined";   # probably says undefined
    BEGIN { COMPILING::<$?FOO> = 42 }
    say $?FOO;                  # prints 42
        say $?FOO;              # prints 42
        BEGIN { temp COMPILING::<$?FOO> = 43 } # temporizes to *compiling* block
        say $?FOO;              # prints 43
        BEGIN { COMPILING::<$?FOO> = 44 }
        say $?FOO;              # prints 44
        BEGIN { say COMPILING::<$?FOO> }        # prints 44, but $?FOO probably undefined
    say $?FOO;                  # prints 42 (left scope of temp above)
    $?FOO = 45;                 # always an error
    COMPILING::<$?FOO> = 45;    # an error unless we are compiling something

Note that C<< CALLER::<$?FOO> >> might discover the same variable
as C<COMPILING::<$?FOO>>, but only if the compiling scope is the
immediate caller.  Likewise C<< OUTER::<$?FOO> >> might or might not
get you to the right place.  In the abstract, C<COMPILING::<$?FOO>>
goes outwards dynamically until it finds a compiling scope, and so is
guaranteed to find the "right" C<$?FOO>.  (In practice, the compiler
hopefully keeps track of its current compiling scope anyway, so no
scan is needed.)

Perceptive readers will note that this subsumes various "compiler hints"
proposals.  Crazy readers will wonder whether this means you could
set an initial value for other lexicals in the compiling scope.  The
answer is yes.  In fact, this mechanism is probably used by the
exporter to bind names into the importer's namespace.

=item *

The currently compiling Perl parser is switched by modifying
one of the braided languages in
C<< COMPILING::<%?LANG> >>.  Lexically scoped parser changes
should temporize the modification.  Changes from here to
end-of-compilation unit can just assign or bind it.  In general,
most parser changes involve deriving a new grammar and then pointing
one of the
C<< COMPILING::<%?LANG> >> entries at that new grammar.  Alternately, the
tables driving the current parser can be modified without derivation,
but at least one level of anonymous derivation must intervene from
the preceding Perl grammar, or you might be messing up someone else's
grammar.  Basically, the current set of grammars in C<%?LANG> has to belong only to the
current compiling scope.  It may not be shared, at least not without
explicit consent of all parties.  No magical syntax at a distance.
Consent of the governed, and all that.

=item *

Individual sublanguages ("slangs") may be referred to using the C<~> twigil.  The following
are useful:

    $~MAIN       the current main language (e.g. Perl statements)
    $~Q          the current root of quoting language
    $~Quasi      the current root of quasiquoting language
    $~Regex      the current root of regex language
    $~Trans      the current root of transliteration language
    $~P5Regex    the current root of the Perl regex language

Hence, when you are defining a normal Perl macro, you're replacing
C<$~MAIN> with a derived language, but when you define a new regex
backslash sequence, you're replacing C<$~Regex> with a derived
language.  (There may or may not be a syntax in the main language
to do this.)  Note that such changes are automatically scoped
to the lexical scope; as with real slang, the definitions are
temporary and embedded in a larger language inherited from
the surrounding culture.

Instead of defining macros directly you may also mix in one or more
grammar rules by lexically scoped declaration of a new sublanguage:

    augment slang Regex {  # derive from $~Regex and then modify $~Regex
        token backslash:std<\Y> { YY };

This tends to be more efficient since it only has to do one mixin
at the end of the block.  Note that the slang declaration has
nothing to do with package C<Regex>, but only with C<$~Regex>.
Sublanguages are in their own namespace (inside the current value
of C<%?LANG>, in fact).  Hence C<augment> is modifying one of the local
strands of a braided language, not a package somewhere else.

You may also supersede a sublang entirely if, for example,
you just want to disable that sublanguage in the current lexical scope:

    supersede slang P5Regex {}
    m:P5/./;             # kaboom

If you supersede C<MAIN> then you're replacing the Perl parser entirely.
This might be done by, say, the "use COBOL" declaration. C<:-)>

=item *

It is often convenient to have names that contain arbitrary characters
or other data structures.  Typically these uses involve situations
where a set of entities shares a common "short" name, but still needs
for each of its elements to be identifiable individually.  For
example, you might use a module whose short name is C<ThatModule>,
but the complete long name of a module includes its version, naming
authority, and perhaps even its source language.  Similarly,
sets of operators work together in various syntactic categories
with names like C<prefix>, C<infix>, C<postfix>, etc.  The long
names of these operators, however, often contain characters that
are excluded from ordinary identifiers.

For all such uses, an identifier followed by a subscript-like adverbial
form (see below) is considered an I<extended identifier>:

    infix:<+>    # the official name of the operator in $a + $b
    infix:<*>    # the official name of the operator in $a * $b
    infix:«<=»   # the official name of the operator in $a <= $b
    prefix:<+>   # the official name of the operator in +$a
    postfix:<--> # the official name of the operator in $a--

This name is to be thought of semantically, not syntactically.  That is,
the bracketing characters used do not count as part of the name; only
the quoted data matters.  These are all the same name:


Despite the appearance as a subscripting form, these names are resolved
not at run time but at compile time.  The pseudo-subscripts need not
be simple scalars.  These are extended with the same two-element list:

    infix:<?? !!>

An identifier may be extended with multiple named identifier
extensions, in which case the names matter but their order does not.
These name the same module:

    use ThatModule:auth<Somebody>:ver<>
    use ThatModule:ver<>:auth<Somebody>

Adverbial syntax will be described more fully later.


=head1 Literals

=over 4

=item *

A single underscore is allowed only between any two digits in a
literal number, where the definition of digit depends on the radix.
(A single underscore is also allowed between a radix prefix and a
following digit, as explained in the next section.)
Underscores are not allowed anywhere else in any numeric literal,
including next to the radix point or exponentiator, or at the beginning
or end.

=item *

Initial C<0> no longer indicates octal numbers by itself.  You must use
an explicit radix marker for that.  Pre-defined radix prefixes include:

    0b          base 2, digits 0..1
    0o          base 8, digits 0..7
    0d          base 10, digits 0..9
    0x          base 16, digits 0..9,a..f (case insensitive)

Each of these allows an optional underscore after the radix prefix
but before the first digit.  These all mean the same thing:


=item *

The general radix form of a number involves prefixing with the radix
in adverbial form:

    :10<42>             same as 0d42 or 42
    :16<DEAD_BEEF>      same as 0xDEADBEEF
    :8<177777>          same as 0o177777 (65535)
    :2<1.1>             same as 0b1.1 (0d1.5)

Extra digits are assumed to be represented by C<a>..C<z> and C<A>..C<Z>, so you
can go up to base 36.  (Use C<A> and C<B> for base twelve, not C<T> and C<E>.)
Alternately you can use a list of digits in decimal:

    :60[12,34,56]       # 12 * 3600 + 34 * 60 + 56
    :100[3,'.',14,16]   # pi

All numbers representing digits must be less than the radix, or an
error will result (at compile time if constant-folding can catch it,
or at run time otherwise).

Any radix may include a fractional part.  A dot is never ambiguous
because you have to tell it where the number ends:

    :16<dead_beef.face> # fraction
    :16<dead_beef>.face # method call

=item *

Only base 10 (in any form) allows an additional exponentiator starting
with 'e' or 'E'.  All other radixes must either rely on the constant folding
properties of ordinary multiplication and exponentiation, or supply the
equivalent two numbers as part of the string, which will be interpreted
as they would outside the string, that is, as decimal numbers by default:

    :16<dead_beef> * 16**8

It's true that only radixes that define C<e> as a digit are ambiguous that
way, but with any radix it's not clear whether the exponentiator should
be 10 or the radix, and this makes it explicit:

    0b1.1e10                    ILLEGAL, could be read as any of:

    :2<1.1> * 2 ** 10           1536
    :2<1.1> * 10 ** 10          15,000,000,000
    :2<1.1> * :2<10> ** :2<10>  6

So we write those as

    :2<1.1*2**10>               1536
    :2<1.1*10**10>              15,000,000,000
    :2«1.1*:2<10>**:2<10>»      6

The generic string-to-number converter will recognize all of these
forms (including the * form, since constant folding is not available
to the run time).  Also allowed in strings are leading plus or minus,
and maybe a trailing Units type for an implied scaling.  Leading and
trailing whitespace is ignored.  Note also that leading C<0> by itself
I<never> implies octal in Perl 6.

Any of the adverbial forms may be used as a function:

    :2($x)      # "bin2num"
    :8($x)      # "oct2num"
    :10($x)     # "dec2num"
    :16($x)     # "hex2num"

Think of these as setting the default radix, not forcing it.  Like Perl
5's old C<oct()> function, any of these will recognize a number starting
with a different radix marker and switch to the other radix.  However,
note that the C<:16()> converter function will interpret leading C<0b>
or C<0d> as hex digits, not radix switchers.

Use of the functional form on anything that is not a string will throw
an exception explaining that the user has confused a number with the
textual representation of a number.  This is to catch errors such as
a C<:8(777)> that should have been C<< :8<777> >>, or the attempt to use
the function in reverse to produce a textual representation from a number.

=item *

Rational literals are indicated by separating two integer literals
(in any radix) with a slash, and enclosing the whole in angles:

    <1/2>       # one half literal Rat

Whitespace is not allowed on either side of the slash or it will
be split under normal quote-words semantics:

    < 1 / 2 >   # ('1', '/', '2')
    < 1/2 >     # okay, same as <1/2>

Because of constant folding, you may often get away with leaving
out the angles:

    1/2         # 1 divided by 2

However, in that case you have to pay attention to precedence and associativity.
The following does I<not> cube C<2/3>:

    2/3**3      # 2/(3**3), not (2/3)**3

Decimal fractions not using "e" notation are also treated as literal C<Rat> values:

    6.02e23.WHAT     # Num
    1.23456.WHAT     # Rat
    0.11 == 11/100   # True

=item *

Complex literals are similarly indicated by writing an addition or subtraction of
two real numbers (again, without spaces around the operators) inside angles:

    < -3-1i >

As with rational literals, constant folding would produce the same
complex number, but this form parses as a single term, ignoring
surrounding precedence.

(Note that these are not actually special syntactic forms: both
rational and complex literal forms fall out naturally from the semantic
rules of qw quotes described below.)

=item *

C<Blob> literals look similar to integer literals with radix markers, but use
curlies instead of angles:

    :2{0010_1110_1000_10}   a blob1, base 2, 1 bit per column
    :4{}                    a blob2, 2 bits per column
    :8{5235 0437 6}         a blob3, 3 bits per column
    :16{A705E}              a blob4, 4 bits per column

Whitespace and underscores are allowed but ignored.

=item *

Characters indexed by hex numbers can be interpolated into strings
by introducing with C<"\x">, followed by either a bare hex number
(C<"\x263a">) or a hex number in square brackets (C<"\x[263a]">).
Similarly, C<"\o12"> and C<"\o[12]"> interpolate octals--but generally
you should be using hex in the world of Unicode.  Multiple characters
may be specified within any of the bracketed forms by separating the
numbers with comma: C<"\x[41,42,43]">.  You must use the bracketed
form to disambiguate if the unbracketed form would "eat" too many
characters, because all of the unbracketed forms eat as many characters
as they think look like digits in the radix specified.  None of these
notations work in normal Perl code.  They work only in interpolations
and regexes and the like.

Note that the inside of the brackets is not an expression, and you
may not interpolate there, since that would be a double interpolation.
Use curlies to interpolate the values of expressions.

The old C<\123> form is now illegal, as is the C<\0123> form.
Only C<\0> remains, and then only if the next character is not in
the range C<'0'..'7'>.  Octal characters must use C<\o> notation.
Note also that backreferences are no longer represented by C<\1>
and the like--see S05.

=item *

The C<qw/foo bar/> quote operator now has a bracketed form: C<< <foo bar> >>.
When used as a subscript it performs a slice equivalent to C<{'foo','bar'}>.
Elsewhere it is equivalent to a parenthesized list of strings:
C<< ('foo','bar') >>.  Since parentheses are generally reserved just for
precedence grouping, they merely autointerpolate in flat list context.  Therefore

    @a = 1, < x y >, 2;

is equivalent to:

    @a = 1, ('x', 'y'), 2;

which is the same as:

    @a = 1, 'x', 'y', 2;

In item context, the implied grouping parentheses are still there, so

    $a = < a b >;

is equivalent to:

    $a = ('a', 'b');

which, because the parcel is assigned to a scalar, is mostly-eagerly evaluated as a flat list and
turned into a C<Seq> object.  On the other hand, if you backslash the parcel:

    $a = \<a b>;

it is like:

    $a = \('a', 'b');

and ends up as a non-flattening capture object).

Binding is different from assignment.  If bound to a signature, the
C<< <a b> >> parcel will be promoted to a C<Capture> object, but if
bound to a parameter, it will make the flattening/slicing decision
based on the nature of the individual parameter.  That is, if you
pass C<< <a b> >> as an argument, it will bind as a single positional
or slice item, but two slurpy items.

But note that under the parenthesis-rewrite rule, a single value will
still act like a single value.  These are all the same:

    $a = < a >;
    $a = ('a');
    $a = 'a';

That is, a parcel is actually constructed by the comma, not by
the parens.  To force a single value to become a composite object in
item context, either add a comma inside parens, or use an appropriate
constructor or composer for clarity as well as correctness:

    $a = (< a >,);
    $a = ('a',);
    $a = Seq.new('a');
    $a = ['a'];

For any item in the list that appears to be numeric, the literal is
stored as an object with both a string and a numeric nature, where
the string nature always returns the original string.  It is as if
the item is converted to an appropriate numeric type, then a C<Str>
conversion is mixed in that reproduces the original string (if normal
stringification would produce something else).  Hence:

    < 1 1/2 6.02e23 1+2i > # Int/Str Rat/Str Num/Str Complex/Str

The purpose of this would be to facilitate compile-time analysis of
multi-method dispatch, when the user prefers angle notation as the
most readable way to represent a list of numbers, which it often is.
The form with a single value serves as the literal form of numbers
such as C<Rat> and C<Complex> that would otherwise have to be constructed.
It also gives us a reasonable way of visually isolating any known
literal format as a single syntactic unit:

    (-1+2i).polar       # same, but only by constant folding

The degenerate case C<< <> >> is disallowed as a probable attempt to
do IO in the style of Perl 5; that is now written C<lines()>.  (C<<
<STDIN> >> is also disallowed.)  Empty lists are better written with
C<()> or C<Nil> in any case because C<< <> >> will often be misread
as meaning C<('')>.  (Likewise the subscript form C<< %foo<> >>
should be written C<%foo{}> to avoid misreading as C<@foo{''}>.)
If you really want the angle form for stylistic reasons, you can
suppress the error by putting a space inside: C<< < > >>.

Much like the relationship between single quotes and double quotes, single
angles do not interpolate while double angles do.  The double angles may
be written either with French quotes, C<«$foo @bar[]»>, or
with "Texas" quotes, C<<< <<$foo @bar[]>> >>>, as the ASCII workaround.
The implicit split is done after interpolation, but respects quotes
in a shell-like fashion, so that C<«'$foo' "@bar[]"»> is guaranteed to
produce a list of two "words" equivalent to C<< ('$foo', "@bar[]") >>.
C<Pair> notation is also recognized inside C<«...»> and such "words" are
returned as C<Pair> objects.

Colon pairs (but not arrow pairs) are recognized within double angles.
In addition, the double angles allow for comments beginning with C<#>.
These comments work exactly like ordinary comments in Perl code.
Unlike in the shells, any literal C<#> must be quoted, even
ones without whitespace in front of them, but note that this comes
more or less for free with a colon pair like C<< :char<#x263a> >>, since
comments only work in double angles, not single.

=item *

Generalizing the policy on literal numbers above, any literal number
that would overflow a C<Rat64> in the numerator is also stored as
a string.  If a coercion to a wider type, such as C<FatRat>, is
requested, the literal reconverts from the entire original
string, rather than just the value that would fit into a C<Rat64>.
(It may then cache that converted value for next time, of course.)
So if you declare a constant with excess precision, it does not
automatically become a C<FatRat>, which would force all calculations
into the pessimal C<FatRat> type.

    constant pi is export = 3.14159_26535_89793_23846_26433_83279_50288;
    say pi.perl;   # 3141592653589793238/1000000000000000000 (Rat64)
    say pi.Num     # 3.14159265358979
    say pi.Str;    # 3.14159_26535_89793_23846_26433_83279_50288
    say pi.FatRat; # 3.14159265358979323846264338327950288

In this case it is not necessary to put angles around to get the allomorphism.
Merely exceeding the precision of C<Rat64> is sufficient to trigger the
behavior (but only for literals).

=item *

There is now a generalized adverbial form of Pair notation.  The
following table shows the correspondence to the "fatarrow" notation:

    Fat arrow           Adverbial pair  Paren form
    =========           ==============  ==========
    a => True           :a
    a => False          :!a
    a => 0              :a(0)
    a => $x             :a($x)
    a => 'foo'          :a<foo>         :a(<foo>)
    a => <foo bar>      :a<foo bar>     :a(<foo bar>)
    a => «$foo @bar»    :a«$foo @bar»   :a(«$foo @bar»)
    a => {...}          :a{...}         :a({...})
    a => [...]          :a[...]         :a([...])
    a => $a             :$a
    a => @a             :@a
    a => %a             :%a
    a => &a             :&a
    a => $$a            :$$a
    a => @$$a           :@$$a (etc.)
    a => %foo<a>        %foo<a>:p

The fatarrow construct may be used only where a term is expected
because it's considered an expression in its own right, since the
fatarrow itself is parsed as a normal infix operator (even when
autoquoting an identifier on its left).  Because the left side is a
general expression, the fatarrow form may be used to create a Pair
with I<any> value as the key.  On the other hand, when used as above
to generate C<Pair> objects, the adverbial forms are restricted to
the use of identifiers as keys.  You must use the fatarrow form to
generate a C<Pair> where the key is not an identifier.

Despite that restriction, it's possible for other things to
come between a colon and its brackets; however, all of the possible
non-identifier adverbial keys are reserved for special syntactical
forms.  Perl 6 currently recognizes decimal numbers and the null key.
In the following table the first and second columns do I<not> mean
the same thing:

    Simple pair         DIFFERS from    which means
    ===========         ============    ===========
    2 => <101010>       :2<101010>      radix literal 0b101010
    8 => <123>          :8<123>         radix literal 0o123
    16 => <deadbeef>    :16<deadbeef>   radix literal 0xdeadbeef
    16 => $somevalue    :16($somevalue) radix conversion function
    '' => $x            :($x)           signature literal
    '' => ($x,$y)       :($x,$y)        signature literal
    '' => <x>           :<x>            name extension
    '' => «x»           :«x»            name extension
    '' => [$x,$y]       :[$x,$y]        name extension
    '' => { .say }      :{ .say }       adverbial block (not allowed on names)

All of the adverbial forms (including the normal ones with
identifier keys) are considered special tokens and are recognized
in various positions in addition to term position.  In particular,
when used where an infix would be expected they modify the previous
topmost operator that is tighter in precedence than "loose unary"
(see S03):

    1 == 100 :fuzz(3)     # calls: infix:<==>(1, 100, fuzz => 3)

Within declarations the adverbial form is used to rename parameter declarations:

    sub foo ( :externalname($myname) ) {...}

Adverbs modify the meaning of various quoting forms:

    q:x 'cat /etc/passwd'

When appended to an identifier (that is, in postfix position),
the adverbial syntax is used to generate unique variants of that
identifier; this syntax is used for naming operators such as C<<
infix:<+> >> and multiply-dispatched grammatical rules such as
C<statement_control:if>.  When so used, the adverb is considered an
integral part of the name, so C<< infix:<+> >> and C<< infix:<-> >>
are two different operators.  Likewise C<< prefix:<+> >> is different
from C<< infix:<+> >>.  (The notation also has the benefit of grouping
distinct identifiers into easily accessible sets; this is how the
standard Perl 6 grammar knows the current set of infix operators,
for instance.)

Only identifiers that produce a list of one or more values (preferably
strings) are allowed as name extensions; in particular, closures
do not qualify as values, so the C<:{...}> form is not allowed as a
name extender.  In particular, this frees up the block form after a method
name, so it allows us to parse a block as a method argument:

    @stuff.sort:{ +$_ }.map:{ $_ * 2 }

These might look like it is using pairs, but it is really equivalent to

    @stuff.sort: { +$_ }.map: { $_ * 2 }

and the colons are not introducing pairs, but rather introducing
the argument list of the method.  (In any other location, C<:{...}>
would be taken as a pair mapping the null key to a closure.)

Either fatarrow or adverbial pair notation may be used to pass
named arguments as terms to a function or method.  After a call with
parenthesized arguments, only the adverbial syntax may be used to pass
additional arguments.  This is typically used to pass an extra block:

    find($directory) :{ when not /^\./ }

This just naturally falls out from the preceding rules because the
adverbial block is in operator position, so it modifies the "find
operator".  (Parens aren't considered an operator.)

Note that (as usual) the C<{...}> form (either identifier-based
or special) can indicate either a closure or a hash depending on
the contents.  It does I<not> indicate a subscript, since C<:key{}> is
really equivalent to C<key => {}>, and the braces are not behaving
as a postfix at all.  (The function to which it is passed can I<use>
the value as a subscript if it chooses, however.)

Note also that the C<< <a b> >> form is not a subscript and is
therefore equivalent not to C<.{'a','b'}> but rather to C<('a','b')>.
Bare C<< <a> >> turns into C<('a')> rather than C<('a',)>.  (However,
as with the other bracketed forms, the value may end up being used
as a subscript depending on context.)

Two or more adverbs can always be strung together without intervening
punctuation anywhere a single adverb is acceptable.  When used as
named arguments in an argument list, you I<may> put comma between,
because they're just ordinary named arguments to the function, and
a fatarrow pair would work the same.  However, this comma is allowed
only when the first pair occurs where a term is expected.  Where an
infix operator is expected, the adverb is always taken as modifying
the nearest preceding operator that is not hidden within parentheses,
and if you string together multiple such pairs, you may not put commas
between, since that would cause subsequent pairs to look like terms.
(The fatarrow form is not allowed at all in operator position.)
See S06 for the use of adverbs as named arguments.

The negated form (C<:!a>) and the sigiled forms (C<:$a>, C<:@a>,
C<:%a>) never take an argument and don't care what the next character
is.  They are considered complete.  These forms require an identifier
to serve as the key.

For identifiers that take a numeric argument, it is allowed to
abbreviate, for example, C<:sweet(16)> to C<:16sweet>.  (This is
distinguishable from the :16<deadbeef> form, which never has an
alphabetic character following the number.)  Only literal decimal
numbers may be swapped this way.

The other forms of adverb (including the bare C<:a> form) I<always>
look for an immediate bracketed argument, and will slurp it up.
If that's not intended, you must use whitespace between the adverb and
the opening bracket.  The syntax of individual adverbs is the same
everywhere in Perl 6.  There are no exceptions based on whether an
argument is wanted or not.  (There is a minor exception for quote and
regex adverbs, which accept I<only> parentheses as their bracketing
operator, and ignore other brackets, which must be placed in parens
if desired.  See "Paren form" in the table above.)

Except as noted above, the parser always
looks for the brackets.  Despite not indicating a true subscript,
the brackets are similarly parsed as postfix operators.  As postfixes
the brackets may be separated from their initial C<:foo> with either
unspace or dot (or both), but nothing else.

Regardless of syntax, adverbs used as named arguments (in either term
or infix position) generally show up as optional named parameters to
the function in question--even if the function is an operator or macro.
The function in question neither knows nor cares how weird the original
syntax was.

=item *

In addition to C<q> and C<qq>, there is now the base form C<Q> which does
I<no> interpolation unless explicitly modified to do so.  So C<q> is really
short for C<Q:q> and C<qq> is short for C<Q:qq>.  In fact, all quote-like
forms derive from C<Q> with adverbs:

    q//         Q :q //
    qq//        Q :qq //
    rx//        Q :regex //
    s///        Q :subst ///
    tr///       Q :trans ///

Adverbs such as C<:regex> change the language to be parsed by switching
to a different parser.  This can completely change the interpretation
of any subsequent adverbs as well as the quoted material itself.

    q:s//       Q :q :scalar //
    rx:s//      Q :regex :sigspace //

=item *

Generalized quotes may now take adverbs:

    Short       Long            Meaning
    =====       ====            =======
    :x          :exec           Execute as command and return results
    :w          :words          Split result on words (no quote protection)
    :ww         :quotewords     Split result on words (with quote protection)
    :q          :single         Interpolate \\, \q and \' (or whatever)
    :qq         :double         Interpolate with :s, :a, :h, :f, :c, :b
    :s          :scalar         Interpolate $ vars
    :a          :array          Interpolate @ vars
    :h          :hash           Interpolate % vars
    :f          :function       Interpolate & calls
    :c          :closure        Interpolate {...} expressions
    :b          :backslash      Interpolate \n, \t, etc. (implies :q at least)
    :to         :heredoc        Parse result as heredoc terminator
                :regex          Parse as regex
                :subst          Parse as substitution
                :trans          Parse as transliteration
                :code           Quasiquoting
    :p          :path           Return a Path object (see S16 for more options)

You may omit the first colon by joining an initial C<Q>, C<q>, or C<qq> with
a single short form adverb, which produces forms like:

    qw /a b c/;                         # P5-esque qw// meaning q:w
    Qc '...{$x}...';                    # Q:c//, interpolate only closures
    qqx/$cmd @args[]/                   # equivalent to P5's qx//

(Note that C<qx//> doesn't interpolate.)

If you want to abbreviate further, just define a macro:

    macro qx { 'qq:x ' }          # equivalent to P5's qx//
    macro qTO { 'qq:x:w:to ' }    # qq:x:w:to//
    macro quote:<❰ ❱> ($text) { quasi { $text.quoteharder } }

All the uppercase adverbs are reserved for user-defined quotes.
All Unicode delimiters above Latin-1 are reserved for user-defined quotes.

=item *

A consequence of the previous item is that we can now say:

    %hash = qw:c/a b c d {@array} {%hash}/;


    %hash = qq:w/a b c d {@array} {%hash}/;

to interpolate items into a C<qw>.  Conveniently, arrays and hashes
interpolate with only whitespace separators by default, so the subsequent
split on whitespace still works out.  (But the built-in C<«...»> quoter
automatically does interpolation equivalent to C<qq:ww/.../>.  The
built-in C<< <...> >> is equivalent to C<q:w/.../>.)

=item *

Whitespace is allowed between the "q" and its adverb: C<q :w /.../>.

=item *

For these "q" forms the choice of delimiters has no influence on the
semantics.  That is, C<''>, C<"">, C<< <> >>, C<«»>, C<``>, C<()>,
C<[]>, and C<{}> have no special significance when used in place of
C<//> as delimiters.  There may be whitespace before the
opening delimiter. (Which is mandatory for parens because C<q()> is
a subroutine call and C<q:w(0)> is an adverb with arguments).  Other
brackets may also require whitespace when they would be understood as
an argument to an adverb in something like C<< q:z<foo>// >>.
A colon may never be used as the delimiter since it will always be
taken to mean another adverb regardless of what's in front of it.
Nor may a C<#> character be used as the delimiter since it is always
taken as whitespace (specifically, as a comment).
You may not use whitespace or alphanumerics for delimiters.

=item *

New quoting constructs may be declared as macros:

    macro quote:<qX> (*%adverbs) {...}

Note: macro adverbs are automatically evaluated at macro call time if
the adverbs are included in the parse.  If an adverb needs to affect
the parsing of the quoted text of the macro, then an explicit named
parameter may be passed on as a parameter to the C<is parsed> subrule,
or used to select which subrule to invoke.

=item *

You may interpolate double-quotish text into a single-quoted string
using the C<\qq[...]> construct.  Other "q" forms also work, including
user-defined ones, as long as they start with "q".  Otherwise you'll
just have to embed your construct inside a C<\qq[...]>.

=item *

Bare scalar variables always interpolate in double-quotish
strings.  Bare array, hash, and subroutine variables may I<never> be
interpolated.  However, any scalar, array, hash or subroutine variable may
start an interpolation if it is followed by a sequence of one or more bracketed
dereferencers: that is, any of:

=over 4

=item 1. An array subscript

=item 2. A hash subscript

=item 3. A set of parentheses indicating a function call

=item 4. Any of 1 through 3 in their B<dot> form

=item 5. A method call that includes argument parentheses

=item 6. A sequence of one or more unparenthesized method call, followed by any of 1 through 5


In other words, this is legal:

    "Val = $a.ord.fmt('%x')\n"

and is equivalent to

    "Val = { $a.ord.fmt('%x') }\n"

However, no interpolated postfix may start with a backslash, so any
backslash or unspace is not recognized, but instead will be assumed
to be part of the string outside of the interpolation, and subject
to the normal backslashing rules of that quote context:

    my $a = 42;
    "Val = $a\[junk\]";  # Val = 42[junk]
    "Val = $a\[junk]";   # Val = 42[junk]
    "Val = $a\ [junk]";  # Val = 42 [junk]
    "Val = $a\.[junk]";  # Val = 42.[junk]

=item *

In order to interpolate an entire array, it's necessary now to subscript
with empty brackets:

    print "The answers are @foo[]\n"

Note that this fixes the spurious "C<@>" problem in double-quoted email addresses.

As with Perl 5 array interpolation, the elements are separated by a space.
(Except that a space is not added if the element already ends in some kind
of whitespace.  In particular, a list of pairs will interpolate with a
tab between the key and value, and a newline after the pair.)

=item *

In order to interpolate an entire hash, it's necessary to subscript
with empty braces or angles:

    print "The associations are:\n%bar{}"
    print "The associations are:\n%bar<>"

Note that this avoids the spurious "C<%>" problem in double-quoted printf formats.

By default, keys and values are separated by tab characters, and pairs
are terminated by newlines.  (This is almost never what you want, but
if you want something polished, you can be more specific.)

=item *

In order to interpolate the result of a sub call, it's necessary to include
both the sigil and parentheses:

    print "The results are &baz().\n"

The function is called in item context.  (If it returns a list anyway,
that list is interpolated as if it were an array in string context.)

=item *

In order to interpolate the result of a method call without arguments,
it's necessary to include parentheses or extend the call with something
ending in brackets:

    print "The attribute is $obj.attr().\n"
    print "The attribute is $obj.attr<Jan>.\n"

The method is called in item context.  (If it returns a list,
that list is interpolated as if it were an array.)

It is allowed to have a cascade of argumentless methods as long as
the last one ends with parens:

    print "The attribute is %obj.keys.sort.reverse().\n"

(The cascade is basically counted as a single method call for the
end-bracket rule.)

=item *

Multiple dereferencers may be stacked as long as each one ends in
some kind of bracket or is a bare method:

    print "The attribute is @baz[3](1, 2, 3).gethash.{$xyz}<blurfl>.attr().\n"

Note that the final period above is not taken as part of the expression since
it doesn't introduce a bracketed dereferencer.  The parens are not required
on the C<.gethash>, but they are required on the C<.attr()>, since that
terminates the entire interpolation.

In no case may any of the top-level components be separated by
whitespace or unspace.  (These are allowed, though, inside any
bracketing constructs, such as in the C<(1, 2, 3)> above.)

=item *

A bare closure also interpolates in double-quotish context.  It may
not be followed by any dereferencers, since you can always put them
inside the closure.  The expression inside is evaluated in string item
context.  You can force list context on the expression using
the C<list> operator if necessary.  A closure in a string establishes
its own lexical scope.  (Expressions that sneak in without curlies,
such as C<$(...)>, do not establish their own lexical scope, but use
the outer scope, and may even declare variables in the outer scope, since
all the code inside (that isn't in an eval) is seen at compile time.)

The following means the same as the previous example.

    print "The attribute is { @baz[3](1,2,3).gethash.{$xyz}<blurfl>.attr }.\n"

The final parens are unnecessary since we're providing "real" code in
the curlies.  If you need to have double quotes that don't interpolate
curlies, you can explicitly remove the capability:

    qq:c(0) "Here are { $two uninterpolated } curlies";

or equivalently:

    qq:!c "Here are { $two uninterpolated } curlies";

Alternately, you can build up capabilities from single quote to tell
it exactly what you I<do> want to interpolate:

    q:s 'Here are { $two uninterpolated } curlies';

=item *

Secondary sigils (twigils) have no influence over whether the primary sigil
interpolates.  That is, if C<$a> interpolates, so do C<$^a>, C<$*a>,
C<$=a>, C<$?a>, C<$.a>, etc.  It only depends on the C<$>.

=item *

No other expressions interpolate.  Use curlies.

=item *

A class method may not be directly interpolated.  Use curlies:

    print "The dog bark is {Dog.bark}.\n"

=item *

The old disambiguation syntax:


is dead.  Use closure curlies instead:


(You may be detecting a trend here...)

=item *

To interpolate a topical method, use curlies: C<"{.bark}">.

=item *

To interpolate a function call without a sigil, use curlies: C<"{abs $var}">.

=item *

And so on.

=item *

Backslash sequences still interpolate, but there's no longer any C<\v>
to mean I<vertical tab>, whatever that is...  (C<\v> now matches vertical
whitespace in a regex.)  Literal character representations are:

    \a          BELL
    \b          BACKSPACE
    \t          TAB
    \n          LINE FEED
    \f          FORM FEED
    \r          CARRIAGE RETURN
    \e          ESCAPE

=item *

There's also no longer any C<\L>, C<\U>, C<\l>, C<\u>, or C<\Q>.
Use curlies with the appropriate function instead: C<"{ucfirst $word}">.

=item *

You may interpolate any Unicode codepoint by name using C<\c> and
square brackets:


Multiple codepoints constituting a single character may be interpolated
with a single C<\c> by separating the names with comma:


Whether that is regarded as one character or two depends on the
Unicode support level of the current lexical scope.  It is also
possible to interpolate multiple codepoints that do not resolve to
a single character:


[Note: none of the official Unicode character names contains comma.]

You may also put one or more decimal numbers inside the square brackets:

    "\c[13,10]" # CRLF

Any single decimal number may omit the brackets:

    "\c8" # backspace

(Within a regex you may also use C<\C> to match a character that is
not the specified character.)

If the character following C<\c> or C<\C> is neither a left square bracket
nor a decimal digit,
the single following character is turned into a control character by
the usual trick of XORing the 64 bit.  This allows C<\c@> for NULL
and C<\c?> for DELETE, but note that the ESCAPE character may not be
represented that way; it must be represented something like:


Obviously C<\e> is preferred when brevity is needed.

=item *

Any character that I<would> start an interpolation in the current
quote context may be protected from such interpolation by prefixing with
backslash.  The backslash is always removed in this case.

The treatment of backslashed characters that would I<not> have
introduced an interpolation varies depending on the type of quote:

=over 4

=item 1.

Any quoting form that includes C<qq> or C<:qq> in its semantic
derivation (including the normal double quote form) assumes that all
backslashes are to be considered meaningful.  The meaning depends
on whether the following character is alphanumeric; if it is, the
non-interpolating sequence produces a compile-time error.  If the
character is non-alphanumeric, the backslash is silently removed, on
the assumption that the string was backslashed using C<quotemeta()>
or some such.

=item 2.

All other quoting forms (including standard single quotes)
assume that non-interpolating sequences are to be left unaltered
because they are probably intended to pass through to the result.
Backslashes are removed I<only> for the terminating quote or for
characters that would interpolate if unbackslashed.  (In either case,
a special exception is made for brackets; if the left bracket would
interpolate, the right bracket may optionally also be backslashed,
and if so, the backslash will be removed.  If brackets are used as
the delimiters, both left and right C<must> be backslashed the same,
since they would otherwise be counted wrong in the bracket count.)


As a consequence, these all produce the same literal string:

    " \{ this is not a closure } "
    " \{ this is not a closure \} "
    q:c / \{ this is not a closure } /
    q:c / \{ this is not a closure \} /
    q:c { \{ this is not a closure \} }
    q { { this is not a closure } }
    q { \{ this is not a closure \} }

(Of course, matching backslashes is likely to make your syntax
highlighter a bit happier, along with any other naïve bracket
counting algorithms...)

=item *

There are no barewords in Perl 6.  An undeclared bare identifier will
always be taken to mean a subroutine name.  (Class names
(and other type names) are predeclared, or prefixed with the C<::>
type sigil when you're declaring a new one.)  A consequence of this
is that there's no longer any "C<use strict 'subs'>".  Since the syntax
for method calls is distinguished from sub calls, it is only unrecognized
sub calls that must be treated specially.

You still must declare your subroutines, but a bareword with an unrecognized
name is provisionally compiled as a subroutine call, on that assumption that
such a declaration will occur by the end of the current compilation unit:

    foo;         # provisional call if neither &foo nor ::foo is defined so far
    foo();       # provisional call if &foo is not defined so far
    foo($x);     # provisional call if &foo is not defined so far
    foo($x, $y); # provisional call if &foo is not defined so far

    $x.foo;      # not a provisional call; it's a method call on $x
    foo $x:;     # not a provisional call; it's a method call on $x
    foo $x: $y;  # not a provisional call; it's a method call on $x

If a postdeclaration is not seen, the compile fails at C<CHECK> time.
(You are still free to predeclare subroutines explicitly, of course.)
The postdeclaration may be in any lexical or package scope that
could have made the declaration visible to the provisional call had the
declaration occurred before rather than after the provisional

This fixup is done only for provisional calls.  If there
is I<any> real predeclaration visible, it always takes precedence.
In case of multiple ambiguous postdeclarations, either they must all
be multis, or a compile-time error is declared and you must predeclare,
even if one postdeclaration is obviously "closer".  A single
C<proto> predeclaration may make all postdeclared C<multi> work fine,
since that's a run-time dispatch, and all multis are effectively
visible by the time a C<dispatch>'s candidate list is generated.

Parsing of a bareword function as a provisional call is always done
the same way list operators are treated.  If a postdeclaration
bends the syntax to be inconsistent with that, it is an error of
the inconsistent signature variety.

If the unrecognized subroutine name is followed by C<< postcircumfix:<( )> >>,
it is compiled as a provisional function call of the parenthesized form.
If it is not, it is compiled as a provisional function call of
the list operator form, which may or may not have an argument list.
When in doubt, the attempt is made to parse an argument list.  As with
any list operator, an immediate postfix operator is illegal unless it is a
form of parentheses, whereas anything following whitespace will be interpreted
as an argument list if possible.

Based on the signature of the subroutine declaration, there are only
four ways that an argument list can be parsed:

    Signature           # of expected args
    ()                  0
    ($x)                1
    ($x?)               0..1
    (anything else)     0..Inf

That is, a standard subroutine call may be parsed only as a 0-arg term
(or function call), a 1-mandatory-arg prefix operator (or function
call), a 1-optional-arg term or prefix operator (or function call), or
an "infinite-arg" list operator (or function call).  A given signature
might only accept 2 arguments, but the only number distinctions the
parser is allowed to make is between void, singular and plural;
checking that number of arguments supplied matches some number
larger than one must be done as a separate semantic constraint, not
as a syntactic constraint.  Perl functions never take N arguments
off of a list and leave the rest for someone else, except for small
values of N, where small is defined as not more than 1.  You can get
fancier using macros, but macros I<always> require predeclaration.
Since the non-infinite-list forms are essentially behaving as macros,
those forms also require predeclaration.  Only the infinite-list form
may be postdeclared (and hence used provisionally).

It is illegal for a provisional subroutine call to be followed by a
colon postfix, since such a colon is allowed only on an indirect object,
or a method call in dot form.  (It is also allowed on a label when a
statement is expected.) So for any undeclared identifier "C<foo>":

    foo.bar             # ILLEGAL       -- postfix must use foo().bar
    foo .bar            # foo($_.bar)   -- no postfix starts with whitespace
    foo\ .bar           # ILLEGAL       -- must use foo()\ .bar
    foo++               # ILLEGAL       -- postfix must use foo()++
    foo 1,2,3           # foo(1,2,3)    -- args always expected after listop
    foo + 1             # foo(+1)       -- term always expected after listop
    foo;                # foo();        -- no postfix, but no args either
    foo:                #   label       -- must be label at statement boundary.
                                        -- ILLEGAL otherwise
    foo: bar:           #   two labels in a row, okay
    .foo: 1             # $_.foo: 1     -- must be "dot" method with : args
    .foo(1)             # $_.foo(1)     -- must be "dot" method with () args
    .foo                # $_.foo()      -- must be "dot" method with no args
    .$foo: 1            # $_.$foo: 1    -- indirect "dot" method with : args
    foo bar: 1          # bar.foo(1)    -- bar must be predecl as class
                                        -- sub bar allowed here only if 0-ary
                                        -- otherwise you must say (bar):
    foo bar 1           # foo(bar(1))   -- both subject to postdeclaration
                                        -- never taken as indirect object
    foo $bar: 1         # $bar.foo(1)   -- indirect object even if declared sub
                                        -- $bar considered one token
    foo (bar()): 1      # bar().foo(1)  -- even if foo declared sub
    foo bar():          # ILLEGAL       -- bar() is two tokens.
    foo .bar:           # foo(.bar:)    -- colon chooses .bar to listopify
    foo bar baz: 1      # foo(baz.bar(1)) -- colon controls "bar", not foo.
    foo (bar baz): 1    # bar(baz()).foo(1) -- colon controls "foo"
    $foo $bar           # ILLEGAL       -- two terms in a row
    $foo $bar:          # ILLEGAL       -- use $bar.$foo for indirection
    (foo bar) baz: 1    # ILLEGAL       -- use $baz.$(foo bar) for indirection

The indirect object colon only ever dominates a simple term, where
"simple" includes classes and variables and parenthesized expressions,
but explicitly not method calls, because the colon will bind to a
trailing method call in preference.  An indirect object that parses as
more than one token must be placed in parentheses, followed by the colon.

In short, only an identifier followed by a simple term followed by a
postfix colon is C<ever> parsed as an indirect object, but that form
will C<always> be parsed as an indirect object regardless of whether
the identifier is otherwise declared.

=item *

There's also no "C<use strict 'refs'>" because symbolic dereferences
are now syntactically distinguished from hard dereferences.
C<@($arrayref)> must now provide an actual array object, while
C<@::($string)> is explicitly a symbolic reference.  (Yes, this may
give fits to the P5-to-P6 translator, but I think it's worth it to
separate the concepts.  Perhaps the symbolic ref form will admit real
objects in a pinch.)

=item *

There is no hash subscript autoquoting in Perl 6.  Use C<< %x<foo> >>
for constant hash subscripts, or the old standby C<< %x{'foo'} >>.  (It
also works to say C<%x«foo»> as long as you realized it's subject to

But C<< => >> still autoquotes any bare identifier to its immediate
left (horizontal whitespace allowed but not comments).  The identifier is not
subject to keyword or even macro interpretation.  If you say

    $x = do {
        if => 1;

then C<$x> ends up containing the pair C<< ("if" => 1) >>.  Always.
(Unlike in Perl 5, where version numbers didn't autoquote.)

You can also use the :key($value) form to quote the keys of option
pairs.  To align values of option pairs, you may use the
"unspace" postfix forms:

    :longkey\  ($value)
    :shortkey\ <string>
    :fookey\   { $^a <=> $^b }

These will be interpreted as

    :fookey{ $^a <=> $^b }

=item *

The double-underscore forms are going away:

    Old                 New
    ---                 ---
    __LINE__            $?LINE
    __FILE__            $?FILE
    __PACKAGE__         $?PACKAGE
    __END__             =begin END
    __DATA__            =begin DATA

[Note: this paragraph is speculative and subject to drastic change
as S26 evolves.]
The C<=begin END> Pod stream is special in that it assumes there's
no corresponding C<=end END> before end of file.  The C<DATA>
stream is no longer special--any Pod stream in the current file
can be accessed via a filehandle, named as C<< %=POD{'DATA'} >> and such.
Alternately, you can treat a Pod stream as a scalar via C<$=DATA>
or as an array via C<@=DATA>.  Presumably a module could read all
its COMMENT blocks from C<@=COMMENT>, for instance.  Each chunk of
Pod comes as a separate array element.  You have to split it into lines
yourself.  Each chunk has a C<.range> property that indicates its
line number range within the source file.

The lexical routine itself is C<&?ROUTINE>; you can get its name with
C<&?ROUTINE.name>.  The current block is C<&?BLOCK>.  If the block has any
labels, those shows up in C<&?BLOCK.labels>.  Within the lexical scope of
a statement with a label, the label is a pseudo-object representing
the I<dynamically> visible instance of that statement.  (If inside multiple
dynamic instances of that statement, the label represents the innermost one.)
This is known as I<lexotic> semantics.

When you say:

    next LINE;

it is really a method on this pseudo-object, and


would work just as well.  You can exit any labeled block early by saying


=item *

Heredocs are no longer written with C<<< << >>>, but with an adverb on
any other quote construct:

    print qq:to/END/;
        Give $amount to the man behind curtain number $curtain.

Other adverbs are also allowed, as are multiple heredocs within the same

    print q:c:to/END/, q:to/END/;
        Give $100 to the man behind curtain number {$curtain}.
        Here is a $non-interpolated string

=item *

Heredocs allow optional whitespace both before and after terminating
delimiter.  Leading whitespace equivalent to the indentation of the
delimiter will be removed from all preceding lines.  If a line is
deemed to have less whitespace than the terminator, only whitespace
is removed, and a warning may be issued.  (Hard tabs will be assumed
to be C<< ($?TABSTOP // 8) >> spaces, but as long as tabs and spaces are used consistently
that doesn't matter.)  A null terminating delimiter terminates on
the next line consisting only of whitespace, but such a terminator
will be assumed to have no indentation.  (That is, it's assumed to
match at the beginning of any whitespace.)

=item *

There are two possible ways to parse heredocs.  One is to look ahead
for the newline and grab the lines corresponding to the heredoc, and
then parse the rest of the original line.  This is how Perl 5 does it.
Unfortunately this suffers from the problem pervasive in Perl 5 of
multi-pass parsing, which is masked somewhat because there's no way
to hide a newline in Perl 5.  In Perl 6, however, we can use "unspace"
to hide a newline, which means that an algorithm looking ahead to find
the newline must do a full parse (with possible untoward side effects)
in order to locate the newline.

Instead, Perl 6 takes the one-pass approach, and just lazily queues
up the heredocs it finds in a line, and waits until it sees a "real"
newline to look for the text and attach it to the appropriate heredoc.
The downside of this approach is a slight restriction--you may not use
the actual text of the heredoc in code that must run before the line
finishes parsing.  Mostly that just means you can't write:

    BEGIN { say q:to/END/ }
        Say me!

You must instead put the entire heredoc into the C<BEGIN>:

    BEGIN {
        say q:to/END/;
        Say me!

=item *

A version literal is written with a 'v' followed by the version
number in dotted form.  This always constructs a C<Version> object,
not a string.  Only integers and certain wildcards are allowed;
for anything fancier you must coerce a string to a C<Version>:

    v1.2.3                      # okay
    v1.2.*                      # okay, wildcard version
    v1.2.3+                     # okay, wildcard version
    v1.2.3beta                  # illegal
    Version('1.2.3beta')        # okay

Note though that most places that take a version number in Perl accept
it as a named argument, in which case saying C<< :ver<1.2.3beta> >> is fine.
See S11 for more on using versioned modules.

Version objects have a predefined sort order that follows most people's
intuition about versioning: each sorting position sorts numerically
between numbers, alphabetically between alphas, and alphabetics in a
position before numerics.  Missing final positions are assumed to be '.0'.
Except for '0' itself, numbers ignore leading zeros.  For splitting
into sort positions, if any alphabetics (including underscore) are
immediately adjacent to a number, a dot is assumed between them.
Likewise any non-alphanumeric character is assumed to be equivalent
to a dot.  So these are all equivalent:


And these are also equivalent:


So these are in sorted version order:

Note how the last pair assume that an implicit .0 sorts after anything
alphabetic, and that alphabetic is defined according to Unicode, not just
according to ASCII.  The intent of all this is to make sure that prereleases
sort before releases.  Note also that this is still a subset of the
versioning schemes seen in the real world.  Modules with such strange
versions can still be used by Perl since by default Perl imports
external modules by exact version number.  (See S11.)  Only range
operations will be compromised by an unknown foreign collation order,
such as a system that sorts "delta" after "gamma".


=head1 Context

=over 4

=item *

Perl still has the three main contexts: sink (aka void), item (aka scalar), and list.

=item *

In addition to undifferentiated items, we also have these item contexts:

    Context     Type    OOtype   Operator
    -------     ----    ------   --------
    boolean     bit     Bit      ?
    integer     int     Integral int
    numeric     num     Num      +
    string      buf     Str      ~

There are also various container contexts that require particular kinds of
containers (such as slice and hash context; see S03 for details).

=item *

Unlike in Perl 5, objects are no longer always considered true.
It depends on the state of their C<.Bool> property.  Classes get to decide
which of their values are true and which are false.  Individual objects
can override the class definition:

    return 0 but True;

This overrides the C<.Bool> method of the C<0> without changing its
official type (by mixing the method into an anonymous derived type).

=item *

The definition of C<.Bool> for the most ancestral type (that is, the
C<Mu> type) is equivalent to C<.defined>.  Since type objects are
considered undefined, all type objects (including C<Mu> itself)
are false unless the type overrides the definition of C<.Bool>
to include undefined values.  Instantiated objects default to true
unless the class overrides the definition.  Note that if you could
instantiate a C<Mu> it would be considered defined, and thus true.
(It is not clear that this is allowed, however.)

=item *

In general any container types should return false if they are empty,
and true otherwise.  This is true of all the standard container types
except Scalar, which always defers the definition of truth to its
contents.  Non-container types define truthiness much as Perl 5 does.

Just as with the standard types, user-defined types should feel free
to partition their defined values into true and false values if such
a partition makes sense in control flow using boolean contexts, since
the separate C<.defined> method is always there if you need it.


=head1 Lists

=over 4

=item *

List context in Perl 6 is by default lazy.  This means a list can
contain infinite generators without blowing up.  No flattening happens
to a lazy list until it is bound to the signature of a function or
method at call time (and maybe not even then).  We say that such
an argument list is "lazily flattened", meaning that we promise to
flatten the list on demand, but not before.

=item *

There is a "C<list>" operator which imposes a list context on
its arguments even if C<list> itself occurs in a item context.

To force explicit flattening, use the C<flat> contextualizer.
This recursively flattens all parcels into a 1-dimensional list.
When bound to a slurpy parameter, a capture flattens the rest of its positional arguments.

To reform a list so that sub-parcels turn into tree nodes, use the C<.tree>
method, which is essentially a level-sensitive map, with one closure provided
for remapping the parcels at each level:

    $p.tree(*.Seq)        # force level 1 parcels to Seq
    $p.tree(1)            # same thing
    $p.tree               # same thing, defaults to 1 level
    $p.tree(*.Seq,*.list) # force level 1 parcels to Seq, level 2 to list
    $p.tree(*.Seq xx *)   # Turn all subparcels into Seq recursively
    $p.tree(*)            # same thing

When bound to a slice parameter (indicated with C<**>), a capture reforms the rest of its
positional arguments with one level of "treeness", equivalent to
C<@args.tree(1)>, that is, a list of lists, or C<LoL>.  The sublists are not
automatically flattened; that is, if a sublist is a C<Parcel>, it
remains a list until subsequent processing decides how flat or
treelike the sublist should be.

When bound to an item parameter that is not an invocant, a list is
turned into a lazy C<Seq> object, that is, an array that can
extend itself on demand, use the iterators of the list as its new values.
(Once determined, the values are readonly, however.  To create an anonymous
mutable array, use explicit square brackets around the list.)

To force a non-flattening item context, use the "C<item>" operator.

=item *

The C<|> prefix operator may be used to force "capture" context on its
argument and I<also> defeat any scalar argument checking imposed by
subroutine signature declarations.  Any resulting list arguments are
then evaluated lazily.

=item *

To force non-lazy list processing, use the C<eager> list operator.
List assignment is also implicitly eager. (Actually, when we say
"eager" we usually mean "mostly eager" as defined in L<S07>).

    eager $filehandle.lines;    # read all remaining lines

By contrast,


makes no guarantee about how many lines ahead the iterator has read.
Iterators feeding a list are allowed to process in batches, even
when stored within an array.  The array knows that it is extensible,
and calls the iterator as it needs more elements.  (Counting the elements
in the array will also force eager completion.)

This operator is agnostic towards flattening or slicing.  In merely changes
the work-ahead policy for the value generator.

=item *

A variant of C<eager> is the C<hyper> list operator, which declares
not only that you want all the values generated now, but that you want
them badly enough that you don't care what order they're generated in.
That is, C<eager> requires sequential evaluation of the list, while
C<hyper> requests (but does not require) parallel evaluation.  In any
case, it declares that you don't care about the evaluation order.
(Conjecture: populating a hash from a hyper list of pairs could be done
as the results come in, such that some keys can be seen even before
the hyper is done.  Thinking about Map-Reduce algorithms here...)

This operator is agnostic towards flattening or slicing.  It merely changes
the work-ahead policy for the value generator.

=item *

Signatures on non-multi subs can be checked at compile time, whereas
multi sub and method call signatures can only be checked at run time
(in the absence of special instructions to the optimizer).

This is not a problem for arguments that are arrays or hashes,
since they don't have to care about their context, but just return
themselves in any event, which may or may not be lazily flattened.

However, function calls in the argument list can't know their eventual
context because the method hasn't been dispatched yet, so we don't
know which signature to check against.  Such return values are
bundled up into a "parcel" for later delivery to a context that
will determine its context lazily.

=item *

The C<< => >> operator now constructs C<Pair> objects rather than merely
functioning as a comma.  Both sides are in item context.

=item *

The C<< .. >> operator now constructs a C<Range> object rather than merely
functioning as an operator.  Both sides are in item context.  Semantically,
the C<Range> acts like a list of its values to the extent possible, but
does so lazily, unlike Perl 5's eager range operator.

=item *

There is no such thing as a hash list context.  Assignment to a hash
produces an ordinary list context.  You may assign alternating keys
and values just as in Perl 5.  You may also assign lists of C<Pair> objects, in
which case each pair provides a key and a value.  You may, in fact,
mix the two forms, as long as the pairs come when a key is expected.
If you wish to supply a C<Pair> as a key, you must compose an outer C<Pair>
in which the key is the inner C<Pair>:

    %hash = (($keykey => $keyval) => $value);

=item *

The anonymous C<enum> function takes a list of keys or pairs, and adds
values to any keys that are not already part of a key.  The value added
is one more than the previous key or pair's value.  This works nicely with
the new C<qq:ww> form:

    %hash = enum <<:Mon(1) Tue Wed Thu Fri Sat Sun>>;
    %hash = enum « :Mon(1) Tue Wed Thu Fri Sat Sun »;

are the same as:

    %hash = ();
    %hash<Mon Tue Wed Thu Fri Sat Sun> = 1..7;

=item *

In contrast to assignment, binding to a hash requires a C<Hash> (or
C<Pair>) object.  Binding to a "splat" hash requires a list of pairs
or hashes, and stops processing the argument list when it runs out
of pairs or hashes.  See S06 for much more about parameter binding.


=head1 Files

=over 4

=item *

Filename globs are no longer done with angle brackets.  Use the C<glob>

=item *

Input from a filehandle is no longer done with angle brackets.  Instead

    while (<HANDLE>) {...}

you now write

    for @$handle {...}


    for $handle.lines {...}


=head1 Properties

=over 4

=item *

Properties work as detailed in S12.  They're actually object
attributes provided by role mixins.  Compile-time properties applied
to containers and such still use the C<is> keyword, but are now called
"traits".  On the other hand, run-time properties are attached to
individual objects using the C<but> keyword instead, but are still
called "properties".

=item *

Properties are accessed just like attributes because they are in fact
attributes of some class or other, even if it's an anonymous singleton
class generated on the fly for that purpose.  Since "C<rw>" attributes
behave in all respects as variables, properties may therefore also
be temporized with C<temp>, or hypotheticalized with C<let>.


=head1 Grammatical Categories

Lexing in Perl 6 is controlled by a system of grammatical categories.
At each point in the parse, the lexer knows which subset of the
grammatical categories are possible at that point, and follows the
longest-token rule across all the active alternatives, including those
representing any grammatical categories that are ready to match.  
See L<S05> for a detailed description of this process.

To get a list of the current categories, grep 'token category:' from STD.pm6.

Category names are used as the short name of both various operators
and the rules that parse them, though the latter include an extra "sym":

    infix:<cmp>           # the infix cmp operator
    infix:sym<cmp>        # the rule that parses cmp

As you can see, the extention of the name uses colon pair notation.
The C<:sym> typically takes an argument giving the string name of the
operator; some of the "circumfix" categories require two arguments
for the opening and closing strings.  Since there are so many match
rules whose symbol is an identifier, we allow a shorthand:

    infix:cmp             # same as infix:sym<cmp> (not infix:<cmp>)

Conjecturally, we might also have other kinds of rules, such as tree rewrite rules:

    infix:match<cmp>      # rewrite a match node after reducing its arguments
    infix:ast<cmp>        # rewrite an ast node after reducing its arguments

Within a grammar, matching the proto subrule <infix> will match all visible rules
in the infix category as parallel alteratives, as if they were separated by 'C<|>'.

Here are some of the names of parse rules in STD:

    category:sym<prefix>                           prefix:<+>
    circumfix:sym<[ ]>                             [ @x ]
    dotty:sym<.=>                                  $obj.=method
    infix_circumfix_meta_operator:sym['»','«']     @a »+« @b
    infix_postfix_meta_operator:sym<=>             $x += 2;
    infix_prefix_meta_operator:sym<!>              $x !~~ 2;
    infix:sym<+>                                   $x + $y
    package_declarator:sym<role>                   role Foo;
    postcircumfix:sym<[ ]>                         $x[$y] or $x.[$y]
    postfix_prefix_meta_operator:sym('»')          @array »++
    postfix:sym<++>                                $x++
    prefix_circumfix_meta_operator:sym<[ ]>       [*]
    prefix_postfix_meta_operator:sym('«')          -« @magnitudes
    prefix:sym<!>                                  !$x (and $x.'!')
    quote:sym<qq>                                  qq/foo/
    routine_declarator:sym<sub>                    sub foo {...}
    scope_declarator:sym<has>                      has $.x;
    sigil:sym<%>                                   %hash
    special_variable:sym<$!>                       $!
    statement_control:sym<if>                      if $condition { 1 } else { 2 }
    statement_mod_cond:sym<if>                     .say if $condition
    statement_mod_loop:sym<for>                    .say for 1..10
    statement_prefix:sym<gather>                   gather for @foo { .take }
    term:sym<!!!>                                  $x = { !!! }
    trait_mod:sym<does>                            my $x does Freezable
    twigil:sym<?>                                  $?LINE
    type_declarator:sym<subset>                    subset Nybble of Int where ^16

Note that some of these produce correspondingly named operators,
but not all of them.  When they do correspond (such as in the C<cmp>
example above), this is by convention, not by enforcement.  (However,
matching C<< <sym> >> within one of these rules instead of the literal
operator makes it easier to set up this correspondence in subsequent

The STD::Regex grammar also adds these:

    assertion:sym<!>                         /<!before \h>/
    backslash:sym<w>                         /\w/ and /\W/
    metachar:sym<.>                          /.*/
    mod_internal:sym<P5>                     m:/ ... :P5 ... /
    quantifier:sym<*>                        /.*/

=for vim:set expandtab sw=4: