Fix off-by-one excluding U+10FFFF from valid Unicode range in emitter by bysiber · Pull Request #918 · yaml/pyyaml

bysiber · 2026-02-20T09:52:08Z

Summary

The emitter's analyze_scalar method has an off-by-one error that excludes U+10FFFF from the valid Unicode range, causing it to be treated as a "special character" and forcing double-quoted scalar style.

Problem

The supplementary plane check uses strict less-than:

or '\U00010000' <= ch < '\U0010ffff'

This excludes U+10FFFF, which is a valid Unicode code point per the YAML spec's c-printable production (YAML 1.1 section 4.1):

c-printable ::= ... | [#x10000-#x10FFFF]

Meanwhile, reader.py correctly uses an inclusive upper bound in its character validation regex:

'\U00010000-\U0010ffff'

This creates an inconsistency: the reader accepts U+10FFFF, but the emitter treats it as special, unnecessarily forcing double-quoted style when the scalar could use plain, single-quoted, or block styles.

Demonstration

import yaml, io
from yaml.emitter import Emitter

e = Emitter(io.StringIO(), allow_unicode=True)

# U+10FFFE: correctly treated as valid unicode
a1 = e.analyze_scalar('\U0010fffe')
print(a1.allow_single_quoted)  # True

# U+10FFFF: incorrectly treated as special character
a2 = e.analyze_scalar('\U0010ffff')
print(a2.allow_single_quoted)  # False — should be True

Fix

Change < to <= to include U+10FFFF in the valid range, matching both the YAML spec and the reader's validation.

The supplementary plane range check in analyze_scalar uses strict less-than (< U+10FFFF) instead of less-than-or-equal (<= U+10FFFF). This excludes U+10FFFF, a valid Unicode code point per the YAML spec's c-printable production, causing it to be treated as a special character and unnecessarily forcing double-quoted style. The reader module correctly includes U+10FFFF in its acceptance range.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Fix off-by-one excluding U+10FFFF from valid Unicode range in emitter#918

Fix off-by-one excluding U+10FFFF from valid Unicode range in emitter#918
bysiber wants to merge 1 commit intoyaml:mainfrom
bysiber:fix/emitter-unicode-range-off-by-one

bysiber commented Feb 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

bysiber commented Feb 20, 2026

Summary

Problem

Demonstration

Fix

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant