2.1 Character Syntax

The Lisp reader takes characters from a stream, interprets them as a printed representation of an object, constructs that object, and returns it.

The syntax described by this chapter is called the standard syntax. Operations are provided by Common Lisp so that various aspects of the syntax information represented by a readtable can be modified under program control; see Chapter 23 (Reader). Except as explicitly stated otherwise, the syntax used throughout this document is standard syntax.

2.1.1 Readtables

Syntax information for use by the Lisp reader is embodied in an object called a readtable. Among other things, the readtable contains the association between characters and syntax types.

Figure 2–1 lists some defined names that are applicable to readtables. The Current Readtable

Several readtables describing different syntaxes can exist, but at any given time only one, called the current readtable, affects the way in which expressions2 are parsed into objects by the Lisp reader. The current readtable in a given dynamic environment is the value of *readtable* in that environment. To make a different readtable become the current readtable, *readtable* can be assigned or bound. The Standard Readtable

The standard readtable conforms to standard syntax. The consequences are undefined if an attempt is made to modify the standard readtable. To achieve the effect of altering or extending standard syntax, a copy of the standard readtable can be created; see the function copy-readtable.

The readtable case of the standard readtable is :upcase. The Initial Readtable

The initial readtable is the readtable that is the current readtable at the time when the Lisp image starts. At that time, it conforms to standard syntax. The initial readtable is distinct from the standard readtable. It is permissible for a conforming program to modify the initial readtable.

2.1.2 Variables that affect the Lisp Reader

The Lisp reader is influenced not only by the current readtable, but also by various dynamic variables. Figure 2–2 lists the variables that influence the behavior of the Lisp reader.

Figure 2–2. Variables that influence the Lisp reader.

2.1.3 Standard Characters

All implementations must support a character repertoire called standard-char; characters that are members of that repertoire are called standard characters.

The standard-char repertoire consists of the non-graphic character newline, the graphic character space, and the following additional ninety-four graphic characters or their equivalents:

Graphic ID Glyph Description Graphic ID Glyph Description
LA01 a small a LN01 n small n
LA02 A capital A LN02 N capital N
LB01 b small b LO01 o small o
LB02 B capital B LO02 O capital O
LC01 c small c LP01 p small p
LC02 C capital C LP02 P capital P
LD01 d small d LQ01 q small q
LD02 D capital D LQ02 Q capital Q
LE01 e small e LR01 r small r
LE02 E capital E LR02 R capital R
LF01 f small f LS01 s small s
LF02 F capital F LS02 S capital S
LG01 g small g LT01 t small t
LG02 G capital G LT02 T capital T
LH01 h small h LU01 u small u
LH02 H capital H LU02 U capital U
LI01 i small i LV01 v small v
LI02 I capital I LV02 V capital V
LJ01 j small j LW01 w small w
LJ02 J capital J LW02 W capital W
LK01 k small k LX01 x small x
LK02 K capital K LX02 X capital X
LL01 l small l LY01 y small y
LL02 L capital L LY02 Y capital Y
LM01 m small m LZ01 z small z
LM02 M capital M LZ02 Z capital Z
Figure 2–3. Standard Character Subrepertoire (Part 1 of 3: Latin Characters)
Graphic ID Glyph Description Graphic ID Glyph Description
ND01 1 digit 1 ND06 6 digit 6
ND02 2 digit 2 ND07 7 digit 7
ND03 3 digit 3 ND08 8 digit 8
ND04 4 digit 4 ND09 9 digit 9
ND05 5 digit 5 ND10 0 digit 0
Figure 2–4. Standard Character Subrepertoire (Part 2 of 3: Numeric Characters)
Graphic ID Glyph Description
SP02 ! exclamation mark
SC03 $ dollar sign
SP04 " quotation mark, or double quote
SP05 ' apostrophe, or [single] quote
SP06 ( left parenthesis, or open parenthesis
SP07 ) right parenthesis, or close parenthesis
SP08 , comma
SP09 _ low line, or underscore
SP10 - hyphen, or minus [sign]
SP11 . full stop, period, or dot
SP12 / solidus, or slash
SP13 : colon
SP14 ; semicolon
SP15 ? question mark
SA01 + plus [sign]
SA03 < less-than [sign]
SA04 = equals [sign]
SA05 > greater-than [sign]
SM01 # number sign, or sharp[sign]
SM02 % percent [sign]
SM03 & ampersand
SM04 * asterisk, or star
SM05 @ commercial at, or at-sign
SM06 [ left [square] bracket
SM07 \ reverse solidus, or backslash
SM08 ] right [square] bracket
SM11 { left curly bracket, or left brace
SM13 | vertical bar
SM14 } right curly bracket, or right brace
SD13 ` grave accent, or backquote
SD15 ^ circumflex accent
SD19 ~ tilde
Figure 2–5. Standard Character Subrepertoire (Part 3 of 3: Special Characters)

The graphic IDs are not used within Common Lisp, but are provided for cross reference purposes with ISO 6937/2. Note that the first letter of the graphic ID categorizes the character as follows: L — Latin, N — Numeric, S — Special.

2.1.4 Character Syntax Types

The Lisp reader constructs an object from the input text by interpreting each character according to its syntax type. The Lisp reader cannot accept as input everything that the Lisp printer produces, and the Lisp reader has features that are not used by the Lisp printer. The Lisp reader can be used as a lexical analyzer for a more general user-written parser.

When the Lisp reader is invoked, it reads a single character from the input stream and dispatches according to the syntax type of that character. Every character that can appear in the input stream is of one of the syntax types shown in Figure 2–6.

Figure 2–6. Possible Character Syntax Types

The syntax type of a character in a readtable determines how that character is interpreted by the Lisp reader while that readtable is the current readtable. At any given time, every character has exactly one syntax type.

Figure 2–7 lists the syntax type of each character in standard syntax.

character syntax type character syntax type
Backspace constituent 0–9 constituent
Tab whitespace2 : constituent
Newline whitespace2 ; terminating macro char
Linefeed whitespace2 < constituent
Page whitespace2 = constituent
Return whitespace2 > constituent
Space whitespace2 ? constituent*
! constituent* @ constituent
" terminating macro char A–Z constituent
# non-terminating macro char [ constituent*
$ constituent \ single escape
% constituent ] constituent*
& constituent ^ constituent
terminating macro char _ constituent
( terminating macro char terminating macro char
) terminating macro char a–z constituent
* constituent { constituent*
+ constituent | multiple escape
, terminating macro char } constituent*
- constituent ~ constituent
. constituent Rubout constituent
/ constituent
Figure 2–7. Character Syntax Types in Standard Syntax

The characters marked with an asterisk (*) are initially constituents, but they are not used in any standard Common Lisp notations. These characters are explicitly reserved to the programmer. ~ is not used in Common Lisp, and reserved to implementors. $ and % are alphabetic2 characters, but are not used in the names of any standard Common Lisp defined names.

Whitespace2 characters serve as separators but are otherwise ignored. Constituent and escape characters are accumulated to make a token, which is then interpreted as a number or symbol. Macro characters trigger the invocation of functions (possibly user-supplied) that can perform arbitrary parsing actions. Macro characters are divided into two kinds, terminating and non-terminating, depending on whether or not they terminate a token. The following are descriptions of each kind of syntax type. Constituent Characters

Constituent characters are used in tokens. A token is a representation of a number or a symbol. Examples of constituent characters are letters and digits.

Letters in symbol names are sometimes converted to letters in the opposite case when the name is read; see Section 23.1.2 (Effect of Readtable Case on the Lisp Reader). Case conversion can be suppressed by the use of single escape or multiple escape characters. Constituent Traits

Every character has one or more constituent traits that define how the character is to be interpreted by the Lisp reader when the character is a constituent character. These constituent traits are alphabetic2, digit, package marker, plus sign, minus sign, dot, decimal point, ratio marker, exponent marker, and invalid. Figure 2–8 shows the constituent traits of the standard characters and of certain semi-standard characters; no mechanism is provided for changing the constituent trait of a character. Any character with the alphadigit constituent trait in that figure is a digit if the current input base is greater than that character’s digit value, otherwise the character is alphabetic2. Any character quoted by a single escape is treated as an alphabetic2 constituent, regardless of its normal syntax.

constituent traits constituent traits
character character
Backspace invalid { alphabetic2
Tab invalid* } alphabetic2
Newline invalid* + alphabetic2, plus sign
Linefeed invalid* - alphabetic2, minus sign
Page invalid* . alphabetic2, dot, decimal point
Return invalid* / alphabetic2, ratio marker
Space invalid* A, a alphadigit
! alphabetic2 B, b alphadigit
" alphabetic2* C, c alphadigit
# alphabetic2* D, d alphadigit, double-float exponent marker
$ alphabetic2 E, e alphadigit, float exponent marker
% alphabetic2 F, f alphadigit, single-float exponent marker
& alphabetic2 G, g alphadigit
alphabetic2* H, h alphadigit
( alphabetic2* I, i alphadigit
) alphabetic2* J, j alphadigit
* alphabetic2 K, k alphadigit
, alphabetic2* L, l alphadigit, long-float exponent marker
0-9 alphadigit M, m alphadigit
: package marker N, n alphadigit
; alphabetic2* O, o alphadigit
< alphabetic2 P, p alphadigit
= alphabetic2 Q, q alphadigit
> alphabetic2 R, r alphadigit
? alphabetic2 S, s alphadigit, short-float exponent marker
@ alphabetic2 T, t alphadigit
[ alphabetic2 U, u alphadigit
\ alphabetic2* V, v alphadigit
] alphabetic2 W, w alphadigit
^ alphabetic2 X, x alphadigit
_ alphabetic2 Y, y alphadigit
alphabetic2* Z, z alphadigit
| alphabetic2* Rubout invalid
~ alphabetic2
Figure 2–8. Constituent Traits of Standard Characters and Semi-Standard Characters

The interpretations in this table apply only to characters whose syntax type is constituent. Entries marked with an asterisk (*) are normally shadowed2 because the indicated characters are of syntax type whitespace2, macro character, single escape, or multiple escape; these constituent traits apply to them only if their syntax types are changed to constituent. Invalid Characters

Characters with the constituent trait invalid cannot ever appear in a token except under the control of a single escape character. If an invalid character is encountered while an object is being read, an error of type reader-error is signaled. If an invalid character is preceded by a single escape character, it is treated as an alphabetic2 constituent instead. Macro Characters

When the Lisp reader encounters a macro character on an input stream, special parsing of subsequent characters on the input stream is performed.

A macro character has an associated function called a reader macro function that implements its specialized parsing behavior. An association of this kind can be established or modified under control of a conforming program by using the functions set-macro-character and set-dispatch-macro-character.

Upon encountering a macro character, the Lisp reader calls its reader macro function, which parses one specially formatted object from the input stream. The function either returns the parsed object, or else it returns no values to indicate that the characters scanned by the function are being ignored (e.g., in the case of a comment). Examples of macro characters are backquote, single-quote, left-parenthesis, and right-parenthesis.

A macro character is either terminating or non-terminating. The difference between terminating and non-terminating macro characters lies in what happens when such characters occur in the middle of a token. If a non-terminating macro character occurs in the middle of a token, the function associated with the non-terminating macro character is not called, and the non-terminating macro character does not terminate the token’s name; it becomes part of the name as if the macro character were really a constituent character. A terminating macro character terminates any token, and its associated reader macro function is called no matter where the character appears. The only non-terminating macro character in standard syntax is sharpsign.

If a character is a dispatching macro character C1, its reader macro function is a function supplied by the implementation. This function reads decimal digit characters until a non-digit C2 is read. If any digits were read, they are converted into a corresponding integer infix parameter P; otherwise, the infix parameter P is nil. The terminating non-digit C2 is a character (sometimes called a “sub-character” to emphasize its subordinate role in the dispatching) that is looked up in the dispatch table associated with the dispatching macro character C1. The reader macro function associated with the sub-character C2 is invoked with three arguments: the stream, the sub-character C2, and the infix parameter P. For more information about dispatch characters, see the function set-dispatch-macro-character.

For information about the macro characters that are available in standard syntax, see Section 2.4 (Standard Macro Characters). Multiple Escape Characters

A pair of multiple escape characters is used to indicate that an enclosed sequence of characters, including possible macro characters and whitespace2 characters, are to be treated as alphabetic2 characters with case preserved. Any single escape and multiple escape characters that are to appear in the sequence must be preceded by a single escape character.

Vertical-bar is a multiple escape character in standard syntax. Examples of Multiple Escape Characters
;; The following examples assume the readtable case of *readtable* 
;; and *print-case* are both :upcase. 
(eq 'abc 'ABC)  true 
(eq 'abc '|ABC|)  true 
(eq 'abc 'a|B|c)  true 
(eq 'abc '|abc|)  false Single Escape Character

A single escape is used to indicate that the next character is to be treated as an alphabetic2 character with its case preserved, no matter what the character is or which constituent traits it has.

Backslash is a single escape character in standard syntax. Examples of Single Escape Characters
;; The following examples assume the readtable case of *readtable* 
;; and *print-case* are both :upcase. 
(eq 'abc '\A\B\C)  true 
(eq 'abc 'a\Bc)  true 
(eq 'abc '\ABC)  true 
(eq 'abc '\abc)  false Whitespace Characters

Whitespace2 characters are used to separate tokens.

Space and newline are whitespace2 characters in standard syntax. Examples of Whitespace Characters
(length '(this-that))  1 
(length '(this - that))  3 
(length '(a 
          b))  2 
(+ 34)  34 
(+ 3 4)  7