The Lisp reader takes characters from a stream, interprets them as a printed representation of an object, constructs that object, and returns it.
The syntax described by this chapter is called the standard syntax. Operations are provided by Common Lisp so that various aspects of the syntax information represented by a readtable can be modified under program control; see Chapter 23 (Reader). Except as explicitly stated otherwise, the syntax used throughout this document is standard syntax.
Syntax information for use by the Lisp reader is embodied in an object called a readtable. Among other things, the readtable contains the association between characters and syntax types.
Figure 2–1 lists some defined names that are applicable to readtables.
Several readtables describing different syntaxes can exist, but at any given time only one, called the current readtable, affects the way in which expressions2 are parsed into objects by the Lisp reader. The current readtable in a given dynamic environment is the value of *readtable* in that environment. To make a different readtable become the current readtable, *readtable* can be assigned or bound.
The standard readtable conforms to standard syntax. The consequences are undefined if an attempt is made to modify the standard readtable. To achieve the effect of altering or extending standard syntax, a copy of the standard readtable can be created; see the function copy-readtable.
The readtable case of the standard readtable is :upcase.
The initial readtable is the readtable that is the current readtable at the time when the Lisp image starts. At that time, it conforms to standard syntax. The initial readtable is distinct from the standard readtable. It is permissible for a conforming program to modify the initial readtable.
The Lisp reader is influenced not only by the current readtable, but also by various dynamic variables. Figure 2–2 lists the variables that influence the behavior of the Lisp reader.
All implementations must support a character repertoire called standard-char; characters that are members of that repertoire are called standard characters.
The standard-char repertoire consists of the non-graphic character newline, the graphic character space, and the following additional ninety-four graphic characters or their equivalents:
|Graphic ID||Glyph||Description||Graphic ID||Glyph||Description|
|LA01|| ||small a||LN01|| ||small n|
|LA02|| ||capital A||LN02|| ||capital N|
|LB01|| ||small b||LO01|| ||small o|
|LB02|| ||capital B||LO02|| ||capital O|
|LC01|| ||small c||LP01|| ||small p|
|LC02|| ||capital C||LP02|| ||capital P|
|LD01|| ||small d||LQ01|| ||small q|
|LD02|| ||capital D||LQ02|| ||capital Q|
|LE01|| ||small e||LR01|| ||small r|
|LE02|| ||capital E||LR02|| ||capital R|
|LF01|| ||small f||LS01|| ||small s|
|LF02|| ||capital F||LS02|| ||capital S|
|LG01|| ||small g||LT01|| ||small t|
|LG02|| ||capital G||LT02|| ||capital T|
|LH01|| ||small h||LU01|| ||small u|
|LH02|| ||capital H||LU02|| ||capital U|
|LI01|| ||small i||LV01|| ||small v|
|LI02|| ||capital I||LV02|| ||capital V|
|LJ01|| ||small j||LW01|| ||small w|
|LJ02|| ||capital J||LW02|| ||capital W|
|LK01|| ||small k||LX01|| ||small x|
|LK02|| ||capital K||LX02|| ||capital X|
|LL01|| ||small l||LY01|| ||small y|
|LL02|| ||capital L||LY02|| ||capital Y|
|LM01|| ||small m||LZ01|| ||small z|
|LM02|| ||capital M||LZ02|| ||capital Z|
|Graphic ID||Glyph||Description||Graphic ID||Glyph||Description|
|ND01|| ||digit 1||ND06|| ||digit 6|
|ND02|| ||digit 2||ND07|| ||digit 7|
|ND03|| ||digit 3||ND08|| ||digit 8|
|ND04|| ||digit 4||ND09|| ||digit 9|
|ND05|| ||digit 5||ND10|| ||digit 0|
|SP02|| ||exclamation mark|
|SC03|| ||dollar sign|
|SP04|| ||quotation mark, or double quote|
|SP05|| ||apostrophe, or [single] quote|
|SP06|| ||left parenthesis, or open parenthesis|
|SP07|| ||right parenthesis, or close parenthesis|
|SP09|| ||low line, or underscore|
|SP10|| ||hyphen, or minus [sign]|
|SP11|| ||full stop, period, or dot|
|SP12|| ||solidus, or slash|
|SP15|| ||question mark|
|SA01|| ||plus [sign]|
|SA03|| ||less-than [sign]|
|SA04|| ||equals [sign]|
|SA05|| ||greater-than [sign]|
|SM01|| ||number sign, or sharp[sign]|
|SM02|| ||percent [sign]|
|SM04|| ||asterisk, or star|
|SM05|| ||commercial at, or at-sign|
|SM06|| ||left [square] bracket|
|SM07|| ||reverse solidus, or backslash|
|SM08|| ||right [square] bracket|
|SM11|| ||left curly bracket, or left brace|
|SM13|| ||vertical bar|
|SM14|| ||right curly bracket, or right brace|
|SD13|| ||grave accent, or backquote|
|SD15|| ||circumflex accent|
The graphic IDs are not used within Common Lisp, but are provided for cross reference purposes with ISO 6937/2. Note that the first letter of the graphic ID categorizes the character as follows: L — Latin, N — Numeric, S — Special.
The Lisp reader constructs an object from the input text by interpreting each character according to its syntax type. The Lisp reader cannot accept as input everything that the Lisp printer produces, and the Lisp reader has features that are not used by the Lisp printer. The Lisp reader can be used as a lexical analyzer for a more general user-written parser.
When the Lisp reader is invoked, it reads a single character from the input stream and dispatches according to the syntax type of that character. Every character that can appear in the input stream is of one of the syntax types shown in Figure 2–6.
|constituent||macro character||single escape|
The syntax type of a character in a readtable determines how that character is interpreted by the Lisp reader while that readtable is the current readtable. At any given time, every character has exactly one syntax type.
Figure 2–7 lists the syntax type of each character in standard syntax.
|character||syntax type||character||syntax type|
|Newline||whitespace2||;||terminating macro char|
|"||terminating macro char||A–Z||constituent|
|#||non-terminating macro char|| ||constituent*|
|$||constituent|| ||single escape|
|’||terminating macro char|| ||constituent|
|(||terminating macro char||‘||terminating macro char|
|)||terminating macro char||a–z||constituent|
|+||constituent|| ||multiple escape|
|,||terminating macro char|| ||constituent*|
The characters marked with an asterisk (*) are initially constituents, but they are not used in any standard Common Lisp notations. These characters are explicitly reserved to the programmer.
~ is not used in Common Lisp, and reserved to implementors.
% are alphabetic2 characters, but are not used in the names of any standard Common Lisp defined names.
Whitespace2 characters serve as separators but are otherwise ignored. Constituent and escape characters are accumulated to make a token, which is then interpreted as a number or symbol. Macro characters trigger the invocation of functions (possibly user-supplied) that can perform arbitrary parsing actions. Macro characters are divided into two kinds, terminating and non-terminating, depending on whether or not they terminate a token. The following are descriptions of each kind of syntax type.
Constituent characters are used in tokens. A token is a representation of a number or a symbol. Examples of constituent characters are letters and digits.
Letters in symbol names are sometimes converted to letters in the opposite case when the name is read; see Section 23.1.2 (Effect of Readtable Case on the Lisp Reader). Case conversion can be suppressed by the use of single escape or multiple escape characters.
Every character has one or more constituent traits that define how the character is to be interpreted by the Lisp reader when the character is a constituent character. These constituent traits are alphabetic2, digit, package marker, plus sign, minus sign, dot, decimal point, ratio marker, exponent marker, and invalid. Figure 2–8 shows the constituent traits of the standard characters and of certain semi-standard characters; no mechanism is provided for changing the constituent trait of a character. Any character with the alphadigit constituent trait in that figure is a digit if the current input base is greater than that character’s digit value, otherwise the character is alphabetic2. Any character quoted by a single escape is treated as an alphabetic2 constituent, regardless of its normal syntax.
|Newline||invalid*||+||alphabetic2, plus sign|
|Linefeed||invalid*||-||alphabetic2, minus sign|
|Page||invalid*||.||alphabetic2, dot, decimal point|
|Return||invalid*||/||alphabetic2, ratio marker|
|#||alphabetic2*||D, d||alphadigit, double-float exponent marker|
|$||alphabetic2||E, e||alphadigit, float exponent marker|
|%||alphabetic2||F, f||alphadigit, single-float exponent marker|
|,||alphabetic2*||L, l||alphadigit, long-float exponent marker|
|:||package marker||N, n||alphadigit|
|?||alphabetic2||S, s||alphadigit, short-float exponent marker|
| ||alphabetic2||T, t||alphadigit|
| ||alphabetic2||U, u||alphadigit|
| ||alphabetic2*||V, v||alphadigit|
| ||alphabetic2||W, w||alphadigit|
| ||alphabetic2||Y, y||alphadigit|
The interpretations in this table apply only to characters whose syntax type is constituent. Entries marked with an asterisk (*) are normally shadowed2 because the indicated characters are of syntax type whitespace2, macro character, single escape, or multiple escape; these constituent traits apply to them only if their syntax types are changed to constituent.
Characters with the constituent trait invalid cannot ever appear in a token except under the control of a single escape character. If an invalid character is encountered while an object is being read, an error of type reader-error is signaled. If an invalid character is preceded by a single escape character, it is treated as an alphabetic2 constituent instead.
When the Lisp reader encounters a macro character on an input stream, special parsing of subsequent characters on the input stream is performed.
A macro character has an associated function called a reader macro function that implements its specialized parsing behavior. An association of this kind can be established or modified under control of a conforming program by using the functions set-macro-character and set-dispatch-macro-character.
Upon encountering a macro character, the Lisp reader calls its reader macro function, which parses one specially formatted object from the input stream. The function either returns the parsed object, or else it returns no values to indicate that the characters scanned by the function are being ignored (e.g., in the case of a comment). Examples of macro characters are backquote, single-quote, left-parenthesis, and right-parenthesis.
A macro character is either terminating or non-terminating. The difference between terminating and non-terminating macro characters lies in what happens when such characters occur in the middle of a token. If a non-terminating macro character occurs in the middle of a token, the function associated with the non-terminating macro character is not called, and the non-terminating macro character does not terminate the token’s name; it becomes part of the name as if the macro character were really a constituent character. A terminating macro character terminates any token, and its associated reader macro function is called no matter where the character appears. The only non-terminating macro character in standard syntax is sharpsign.
If a character is a dispatching macro character C1, its reader macro function is a function supplied by the implementation. This function reads decimal digit characters until a non-digit C2 is read. If any digits were read, they are converted into a corresponding integer infix parameter P; otherwise, the infix parameter P is nil. The terminating non-digit C2 is a character (sometimes called a “sub-character” to emphasize its subordinate role in the dispatching) that is looked up in the dispatch table associated with the dispatching macro character C1. The reader macro function associated with the sub-character C2 is invoked with three arguments: the stream, the sub-character C2, and the infix parameter P. For more information about dispatch characters, see the function set-dispatch-macro-character.
For information about the macro characters that are available in standard syntax, see Section 2.4 (Standard Macro Characters).
A pair of multiple escape characters is used to indicate that an enclosed sequence of characters, including possible macro characters and whitespace2 characters, are to be treated as alphabetic2 characters with case preserved. Any single escape and multiple escape characters that are to appear in the sequence must be preceded by a single escape character.
Vertical-bar is a multiple escape character in standard syntax.
;; The following examples assume the readtable case of *readtable* ;; and *print-case* are both :upcase. (eq 'abc 'ABC) → true (eq 'abc '|ABC|) → true (eq 'abc 'a|B|c) → true (eq 'abc '|abc|) → false
A single escape is used to indicate that the next character is to be treated as an alphabetic2 character with its case preserved, no matter what the character is or which constituent traits it has.
Backslash is a single escape character in standard syntax.
;; The following examples assume the readtable case of *readtable* ;; and *print-case* are both :upcase. (eq 'abc '\A\B\C) → true (eq 'abc 'a\Bc) → true (eq 'abc '\ABC) → true (eq 'abc '\abc) → false
Whitespace2 characters are used to separate tokens.
Space and newline are whitespace2 characters in standard syntax.
(length '(this-that)) → 1 (length '(this - that)) → 3 (length '(a b)) → 2 (+ 34) → 34 (+ 3 4) → 7