Skip to content

Commit 560b147

Browse files
authored
Merge pull request #2123 from ehuss/valid-unicode-escape
Clarify UNICODE_ESCAPE valid token value
2 parents 6a29736 + a85f11e commit 560b147

File tree

1 file changed

+4
-2
lines changed

1 file changed

+4
-2
lines changed

src/tokens.md

Lines changed: 4 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -157,9 +157,11 @@ ASCII_ESCAPE ->
157157
| `\n` | `\r` | `\t` | `\\` | `\0`
158158
159159
UNICODE_ESCAPE ->
160-
`\u{` ( HEX_DIGIT `_`* ){1..6} `}`
160+
`\u{` ( HEX_DIGIT `_`* ){1..6} _valid hex char value_ `}`[^valid-hex-char]
161161
```
162162

163+
[^valid-hex-char]: See [lex.token.literal.char-escape.unicode].
164+
163165
r[lex.token.literal.char.intro]
164166
A _character literal_ is a single Unicode character enclosed within two `U+0027` (single-quote) characters, with the exception of `U+0027` itself, which must be _escaped_ by a preceding `U+005C` character (`\`).
165167

@@ -196,7 +198,7 @@ r[lex.token.literal.char-escape.ascii]
196198
* A _7-bit code point escape_ starts with `U+0078` (`x`) and is followed by exactly two _hex digits_ with value up to `0x7F`. It denotes the ASCII character with value equal to the provided hex value. Higher values are not permitted because it is ambiguous whether they mean Unicode code points or byte values.
197199

198200
r[lex.token.literal.char-escape.unicode]
199-
* A _24-bit code point escape_ starts with `U+0075` (`u`) and is followed by up to six _hex digits_ surrounded by braces `U+007B` (`{`) and `U+007D` (`}`). It denotes the Unicode code point equal to the provided hex value.
201+
* A _24-bit code point escape_ starts with `U+0075` (`u`) and is followed by up to six _hex digits_ surrounded by braces `U+007B` (`{`) and `U+007D` (`}`). It denotes the Unicode code point equal to the provided hex value. The value must be a valid Unicode scalar value.
200202

201203
r[lex.token.literal.char-escape.whitespace]
202204
* A _whitespace escape_ is one of the characters `U+006E` (`n`), `U+0072` (`r`), or `U+0074` (`t`), denoting the Unicode values `U+000A` (LF), `U+000D` (CR) or `U+0009` (HT) respectively.

0 commit comments

Comments
 (0)