2.3 Interpretation of Tokens

2.3.1 Numbers as Tokens

When a token is read, it is interpreted as a number or symbol. The token is interpreted as a number if it satisfies the syntax for numbers specified in Figure 2–9.

numeric-token ::= integer | ratio | float
integer ::= [sign] {decimal-digit}+ decimal-point | [sign] {digit}+
ratio ::= [sign] {digit}+ slash {digit}+
float ::= [sign] {decimal-digit}* decimal-point {decimal-digit}+ [exponent]
| [sign] {decimal-digit}+ [decimal-point {decimal-digit}*] exponent
exponent ::= exponent-marker [sign] {digit}+
sign — a sign.
slash — a slash
decimal-point — a dot.
exponent-marker — an exponent marker.
decimal-digit — a digit in radix 10.
digit — a digit in the current input radix.
Figure 2–9. Syntax for Numeric Tokens Potential Numbers as Tokens

To allow implementors and future Common Lisp standards to extend the syntax of numbers, a syntax for potential numbers is defined that is more general than the syntax for numbers. A token is a potential number if it satisfies all of the following requirements:

  1. The token consists entirely of digits, signs, ratio markers, decimal points (.), extension characters (^ or _), and number markers. A number marker is a letter. Whether a letter may be treated as a number marker depends on context, but no letter that is adjacent to another letter may ever be treated as a number marker. Exponent markers are number markers.

  2. The token contains at least one digit. Letters may be considered to be digits, depending on the current input base, but only in tokens containing no decimal points.

  3. The token begins with a digit, sign, decimal point, or extension character, but not a package marker. The syntax involving a leading package marker followed by a potential number is not well-defined. The consequences of the use of notation such as :1, :1/2, and :2^3 in a position where an expression appropriate for read is expected are unspecified.

  4. The token does not end with a sign.

If a potential number has number syntax, a number of the appropriate type is constructed and returned, if the number is representable in an implementation. A number will not be representable in an implementation if it is outside the boundaries set by the implementation-dependent constants for numbers. For example, specifying too large or too small an exponent for a float may make the number impossible to represent in the implementation. A ratio with denominator zero (such as -35/000) is not represented in any implementation. When a token with the syntax of a number cannot be converted to an internal number, an error of type reader-error is signaled. An error must not be signaled for specifying too many significant digits for a float; a truncated or rounded value should be produced.

If there is an ambiguity as to whether a letter should be treated as a digit or as a number marker, the letter is treated as a digit. Escape Characters and Potential Numbers

A potential number cannot contain any escape characters. An escape character robs the following character of all syntactic qualities, forcing it to be strictly alphabetic2 and therefore unsuitable for use in a potential number. For example, all of the following representations are interpreted as symbols, not numbers:

\256   25\64   1.0\E6   |100|   3\.14159   |3/4|   3\/4   5||

In each case, removing the escape character (or characters) would cause the token to be a potential number. Examples of Potential Numbers

As examples, the tokens in Figure 2–10 are potential numbers, but they are not actually numbers, and so are reserved tokens; a conforming implementation is permitted, but not required, to define their meaning.

1b5000 777777q 1.7J -3/4+6.7J 12/25/83
27^19 3^4/5 6//7 ^-43^
3.141_592_653_589_793_238_4 -3.7+2.6i-6.17j+19.6k
Figure 2–10. Examples of reserved tokens

The tokens in Figure 2–11 are not potential numbers; they are always treated as symbols:

/ /5 + 1+ 1-
foo+ ab.cd _ ^ ^/-
Figure 2–11. Examples of symbols

The tokens in Figure 2–12 are potential numbers if the current input base is 16, but they are always treated as symbols if the current input base is 10.

bad-face 25-dec-83 a/b fad_cafe f^
Figure 2–12. Examples of symbols or potential numbers

2.3.2 Constructing Numbers from Tokens

A real is constructed directly from a corresponding numeric token; see Figure 2–9.

A complex is notated as a #C (or #c) followed by a list of two reals; see Section (Sharpsign C).

The reader macros #B, #O, #X, and #R may also be useful in controlling the input radix in which rationals are parsed; see Section (Sharpsign B), Section (Sharpsign O), Section (Sharpsign X), and Section (Sharpsign R).

This section summarizes the full syntax for numbers. Syntax of a Rational Syntax of an Integer

Integers can be written as a sequence of digits, optionally preceded by a sign and optionally followed by a decimal point; see Figure 2–9. When a decimal point is used, the digits are taken to be in radix 10; when no decimal point is used, the digits are taken to be in radix given by the current input base.

For information on how integers are printed, see Section (Printing Integers). Syntax of a Ratio

Ratios can be written as an optional sign followed by two non-empty sequences of digits separated by a slash; see Figure 2–9. The second sequence may not consist entirely of zeros. Examples of ratios are in Figure 2–13.

2/3 ;This is in canonical form
4/6 ;A non-canonical form for 2/3
-17/23 ;A ratio preceded by a sign
-30517578125/32768 ;This is (-5/2)15
10/5 ;The canonical form for this is 2
#o-101/75 ;Octal notation for 65/61
#3r120/21 ;Ternary notation for 15/7
#Xbc/ad ;Hexadecimal notation for 188/173
#xFADED/FACADE ;Hexadecimal notation for 1027565/16435934
Figure 2–13. Examples of Ratios

For information on how ratios are printed, see Section (Printing Ratios). Syntax of a Float

Floats can be written in either decimal fraction or computerized scientific notation: an optional sign, then a non-empty sequence of digits with an embedded decimal point, then an optional decimal exponent specification. If there is no exponent specifier, then the decimal point is required, and there must be digits after it. The exponent specifier consists of an exponent marker, an optional sign, and a non-empty sequence of digits. If no exponent specifier is present, or if the exponent marker e (or E) is used, then the format specified by *read-default-float-format* is used. See Figure 2–9.

An implementation may provide one or more kinds of float that collectively make up the type float. The letters s, f, d, and l (or their respective uppercase equivalents) explicitly specify the use of the types short-float, single-float, double-float, and long-float, respectively.

The internal format used for an external representation depends only on the exponent marker, and not on the number of decimal digits in the external representation.

Figure 2–14 contains examples of notations for floats:

0.0 ;Floating-point zero in default format
0E0 ;As input, this is also floating-point zero in default format.
;As output, this would appear as 0.0.
0e0 ;As input, this is also floating-point zero in default format.
;As output, this would appear as 0.0.
-.0 ;As input, this might be a zero or a minus zero,
; depending on whether the implementation supports
; a distinct minus zero.
;As output, 0.0 is zero and -0.0 is minus zero.
0. ;On input, the integer zero — not a floating-point number!
;Whether this appears as 0 or 0. on output depends
;on the value of *print-radix*.
0.0s0 ;A floating-point zero in short format
0s0 ;As input, this is a floating-point zero in short format.
;As output, such a zero would appear as 0.0s0
; (or as 0.0 if short-float was the default format).
6.02E+23 ;Avogadro’s number, in default format
602E+21 ;Also Avogadro’s number, in default format
Figure 2–14. Examples of Floating-point numbers

For information on how floats are printed, see Section (Printing Floats). Syntax of a Complex

A complex has a Cartesian structure, with a real part and an imaginary part each of which is a real. The parts of a complex are not necessarily floats but both parts must be of the same type: either both are rationals, or both are of the same float subtype. When constructing a complex, if the specified parts are not the same type, the parts are converted to be the same type internally (i.e., the rational part is converted to a float). An object of type (complex rational) is converted internally and represented thereafter as a rational if its imaginary part is an integer whose value is 0.

For further information, see Section (Sharpsign C) and Section (Printing Complexes).

2.3.3 The Consing Dot

If a token consists solely of dots (with no escape characters), then an error of type reader-error is signaled, except in one circumstance: if the token is a single dot and appears in a situation where dotted pair notation permits a dot, then it is accepted as part of such syntax and no error is signaled. See Section 2.4.1 (Left-Parenthesis).

2.3.4 Symbols as Tokens

Any token that is not a potential number, does not contain a package marker, and does not consist entirely of dots will always be interpreted as a symbol. Any token that is a potential number but does not fit the number syntax is a reserved token and has an implementation-dependent interpretation. In all other cases, the token is construed to be the name of a symbol.

Examples of the printed representation of symbols are in Figure 2–15. For presentational simplicity, these examples assume that the readtable case of the current readtable is :upcase.

FROBBOZ The symbol whose name is FROBBOZ.
frobboz Another way to notate the same symbol.
fRObBoz Yet another way to notate it.
unwind-protect A symbol with a hyphen in its name.
+$ The symbol named +$.
1+ The symbol named 1+.
+1 This is the integer 1, not a symbol.
pascal_style This symbol has an underscore in its name.
file.rel.43 This symbol has periods in its name.
\( The symbol whose name is (.
\+1 The symbol whose name is +1.
+\1 Also the symbol whose name is +1.
\frobboz The symbol whose name is fROBBOZ.
3.14159265\s0 The symbol whose name is 3.14159265s0.
3.14159265\S0 A different symbol, whose name is 3.14159265S0.
3.14159265s0 A possible short float approximation to π.
Figure 2–15. Examples of the printed representation of symbols (Part 1 of 2)
APL\\360 The symbol whose name is APL\360.
apl\\360 Also the symbol whose name is APL\360.
\(b^2\)\ -\ 4*a*c The name is (B^2) - 4*A*C.
Parentheses and two spaces in it.
\(\b^2\)\ -\4*\a*\c The name is (b^2) - 4*a*c.
Letters explicitly lowercase.
|"| The same as writing \".
|(b^2) - 4*a*c| The name is (b^2) - 4*a*c.
|frobboz| The name is frobboz, not FROBBOZ.
|APL\360| The name is APL360.
|APL\\360| The name is APL\360.
|apl\\360| The name is apl\360.
|\|\|| Same as \|\| — the name is ||.
|(B^2) - 4*A*C| The name is (B^2) - 4*A*C.
Parentheses and two spaces in it.
|(b^2) - 4*a*c| The name is (b^2) - 4*a*c.
Figure 2–16. Examples of the printed representation of symbols (Part 2 of 2)

In the process of parsing a symbol, it is implementation-dependent which implementation-defined attributes are removed from the characters forming a token that represents a symbol.

When parsing the syntax for a symbol, the Lisp reader looks up the name of that symbol in the current package. This lookup may involve looking in other packages whose external symbols are inherited by the current package. If the name is found, the corresponding symbol is returned. If the name is not found (that is, there is no symbol of that name accessible in the current package), a new symbol is created and is placed in the current package as an internal symbol. The current package becomes the owner (home package) of the symbol, and the symbol becomes interned in the current package. If the name is later read again while this same package is current, the same symbol will be found and returned.

2.3.5 Valid Patterns for Tokens

The valid patterns for tokens are summarized in Figure 2–17.

nnnnn a number
xxxxx a symbol in the current package
:xxxxx a symbol in the the KEYWORD package
ppppp:xxxxx an external symbol in the ppppp package
ppppp::xxxxx a (possibly internal) symbol in the ppppp package
:nnnnn undefined
ppppp:nnnnn undefined
ppppp::nnnnn undefined
::aaaaa undefined
aaaaa: undefined
aaaaa:aaaaa:aaaaa undefined
Figure 2–17. Valid patterns for tokens

Note that nnnnn has number syntax, neither xxxxx nor ppppp has number syntax, and aaaaa has any syntax.

A summary of rules concerning package markers follows. In each case, examples are offered to illustrate the case; for presentational simplicity, the examples assume that the readtable case of the current readtable is :upcase.

  1. If there is a single package marker, and it occurs at the beginning of the token, then the token is interpreted as a symbol in the KEYWORD package. It also sets the symbol-value of the newly-created symbol to that same symbol so that the symbol will self-evaluate.

    For example, :bar, when read, interns BAR as an external symbol in the KEYWORD package.

  2. If there is a single package marker not at the beginning or end of the token, then it divides the token into two parts. The first part specifies a package; the second part is the name of an external symbol available in that package.

    For example, foo:bar, when read, looks up BAR among the external symbols of the package named FOO.

  3. If there are two adjacent package markers not at the beginning or end of the token, then they divide the token into two parts. The first part specifies a package; the second part is the name of a symbol within that package (possibly an internal symbol).

    For example, foo::bar, when read, interns BAR in the package named FOO.

  4. If the token contains no package markers, and does not have potential number syntax, then the entire token is the name of the symbol. The symbol is looked up in the current package.

    For example, bar, when read, interns BAR in the current package.

  5. The consequences are unspecified if any other pattern of package markers in a token is used. All other uses of package markers within names of symbols are not defined by this standard but are reserved for implementation-dependent use.

For example, assuming the readtable case of the current readtable is :upcase, editor:buffer refers to the external symbol named BUFFER present in the package named editor, regardless of whether there is a symbol named BUFFER in the current package. If there is no package named editor, or if no symbol named BUFFER is present in editor, or if BUFFER is not exported by editor, the reader signals a correctable error. If editor::buffer is seen, the effect is exactly the same as reading buffer with the EDITOR package being the current package.

2.3.6 Package System Consistency Rules

The following rules apply to the package system as long as the value of *package* is not changed:

Read-read consistency

Reading the same symbol name always results in the same symbol.

Print-read consistency

An interned symbol always prints as a sequence of characters that, when read back in, yields the same symbol.

For information about how the Lisp printer treats symbols, see Section (Printing Symbols).

Print-print consistency

If two interned symbols are not the same, then their printed representations will be different sequences of characters.

These rules are true regardless of any implicit interning. As long as the current package is not changed, results are reproducible regardless of the order of loading files or the exact history of what symbols were typed in when. If the value of *package* is changed and then changed back to the previous value, consistency is maintained. The rules can be violated by changing the value of *package*, forcing a change to symbols or to packages or to both by continuing from an error, or calling one of the following functions: unintern, unexport, shadow, shadowing-import, or unuse-package.

An inconsistency only applies if one of the restrictions is violated between two of the named symbols. shadow, unexport, unintern, and shadowing-import can only affect the consistency of symbols with the same names (under string=) as the ones supplied as arguments.