-
Notifications
You must be signed in to change notification settings - Fork 568
Clarify UNICODE_ESCAPE valid token value #2123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
Conversation
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This clarifies the UNICODE_ESCAPE rule that the hex value must be a
valid Unicode scalar value. This resolves the problem that a string like
`"\u{ffffff}"` is not a valid token, but the grammar did not reflect
that.
I don't see a practical way to define this with character ranges. The
resulting expression is huge.
Note that this restriction means that the UNICODE_ESCAPE rule will not
match an invalid value, and that all the places where UNICODE_ESCAPE is
used, the preceding character must *not* be `\`, which forces those
rules to fail their match. In turn the only rules that contain
UNICODE_ESCAPE have `'` or `"` characters, which won't match any other
rule in the grammar, forcing them to fail the parse.
If all those assumptions seem too fragile, then we can consider adding
the [cut operator](rust-lang#2104)
just after the `\u` so that the interpretation is clear that a failure
to match the part from the opening brace is an immediate parse failure.
traviscross
approved these changes
Dec 22, 2025
JonathanBrouwer
added a commit
to JonathanBrouwer/rust
that referenced
this pull request
Jan 1, 2026
Update books ## rust-lang/reference 21 commits in ec78de0ffe2f8344bd0e222b17ac7a7d32dc7a26..6363385ac4ebe1763f1e6fb2063c0b1db681a072 2025-12-15 16:17:43 +0000 to 2025-12-31 21:12:35 +0000 - Remove cargo workspace inheritance (rust-lang/reference#2129) - identifiers: bump Unicode from 16 to 17 (rust-lang/reference#2071) - Fix alternation order of lexical rules (rust-lang/reference#2126) - Fix overly greedy digits (rust-lang/reference#2124) - Clarify UNICODE_ESCAPE valid token value (rust-lang/reference#2123) - Fix ambiguity of RESERVED_RAW_IDENTIFIER (rust-lang/reference#2122) - Document how closure capturing interacts with discriminant reads (rust-lang/reference#1837) - operator-expr: remove stray word in footnote (rust-lang/reference#2118) - await-expr: add a missing space (rust-lang/reference#2120) - attributes: add missing punctuation to instruction_set (rust-lang/reference#2117) - associated-items: add missing periods (rust-lang/reference#2116) - Move tools into a consolidated cargo workspace (rust-lang/reference#2115) - Unwrap all of the lexical chapters (rust-lang/reference#2113) - Unwrap const_eval.md (rust-lang/reference#2112) - Add section on expansion-time (early) name resolution (rust-lang/reference#2055) - const_eval.md: add missing word (rust-lang/reference#2068) - path-expr.md: use a more suitable punctuation (rust-lang/reference#2082) - items: clarify label for type-aliases documentation (rust-lang/reference#2110) - do not mix singular and plural (rust-lang/reference#2101) - external-blocks: add missing "and" in list (rust-lang/reference#2111) - conditional-compilation: add a space in `cfg.cfg_attr.attribute-list` (rust-lang/reference#2109) ## rust-lang/rust-by-example 2 commits in 7d21279e40e8f0e91c2a22c5148dd2d745aef8b6..2e02f22a10e7eeb758e6aba484f13d0f1988a3e5 2025-12-21 08:47:57 UTC to 2025-12-21 08:46:33 UTC - docs(comments): improve readability and formatting (rust-lang/rust-by-example#1981) - Fix HOF.MD sum of odd squares algorithm (rust-lang/rust-by-example#1980)
rust-timer
added a commit
to rust-lang/rust
that referenced
this pull request
Jan 1, 2026
Rollup merge of #150529 - rustbot:docs-update, r=ehuss Update books ## rust-lang/reference 21 commits in ec78de0ffe2f8344bd0e222b17ac7a7d32dc7a26..6363385ac4ebe1763f1e6fb2063c0b1db681a072 2025-12-15 16:17:43 +0000 to 2025-12-31 21:12:35 +0000 - Remove cargo workspace inheritance (rust-lang/reference#2129) - identifiers: bump Unicode from 16 to 17 (rust-lang/reference#2071) - Fix alternation order of lexical rules (rust-lang/reference#2126) - Fix overly greedy digits (rust-lang/reference#2124) - Clarify UNICODE_ESCAPE valid token value (rust-lang/reference#2123) - Fix ambiguity of RESERVED_RAW_IDENTIFIER (rust-lang/reference#2122) - Document how closure capturing interacts with discriminant reads (rust-lang/reference#1837) - operator-expr: remove stray word in footnote (rust-lang/reference#2118) - await-expr: add a missing space (rust-lang/reference#2120) - attributes: add missing punctuation to instruction_set (rust-lang/reference#2117) - associated-items: add missing periods (rust-lang/reference#2116) - Move tools into a consolidated cargo workspace (rust-lang/reference#2115) - Unwrap all of the lexical chapters (rust-lang/reference#2113) - Unwrap const_eval.md (rust-lang/reference#2112) - Add section on expansion-time (early) name resolution (rust-lang/reference#2055) - const_eval.md: add missing word (rust-lang/reference#2068) - path-expr.md: use a more suitable punctuation (rust-lang/reference#2082) - items: clarify label for type-aliases documentation (rust-lang/reference#2110) - do not mix singular and plural (rust-lang/reference#2101) - external-blocks: add missing "and" in list (rust-lang/reference#2111) - conditional-compilation: add a space in `cfg.cfg_attr.attribute-list` (rust-lang/reference#2109) ## rust-lang/rust-by-example 2 commits in 7d21279e40e8f0e91c2a22c5148dd2d745aef8b6..2e02f22a10e7eeb758e6aba484f13d0f1988a3e5 2025-12-21 08:47:57 UTC to 2025-12-21 08:46:33 UTC - docs(comments): improve readability and formatting (rust-lang/rust-by-example#1981) - Fix HOF.MD sum of odd squares algorithm (rust-lang/rust-by-example#1980)
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This clarifies the UNICODE_ESCAPE rule that the hex value must be a valid Unicode scalar value. This resolves the problem that a string like
"\u{ffffff}"is not a valid token, but the grammar did not reflect that.I don't see a practical way to define this with character ranges. The resulting expression is huge.
Note that this restriction means that the UNICODE_ESCAPE rule will not match an invalid value, and that all the places where UNICODE_ESCAPE is used, the preceding character must not be
\, which forces those rules to fail their match. In turn the only rules that contain UNICODE_ESCAPE have'or"characters, which won't match any other rule in the grammar, forcing them to fail the parse.If all those assumptions seem too fragile, then we can consider adding the cut operator just after the
\uso that the interpretation is clear that a failure to match the part from the opening brace is an immediate parse failure.