Skip to content

Comments

fix: replace invalid codepoints with U+FFFD in entity parser#353

Merged
kjk merged 1 commit intogomarkdown:masterfrom
vnykmshr:fix-entity-null-byte
Feb 17, 2026
Merged

fix: replace invalid codepoints with U+FFFD in entity parser#353
kjk merged 1 commit intogomarkdown:masterfrom
vnykmshr:fix-entity-null-byte

Conversation

@vnykmshr
Copy link
Contributor

Fixes #352.

entity() converts numeric character references to runes without checking for invalid codepoints. � produces a null byte, surrogates produce invalid UTF-8, and values above 0x10FFFF are silently truncated.

What changed

  • parser/inline.go: added a guard for the three invalid codepoint categories (0, surrogates, >0x10FFFF) that substitutes U+FFFD per CommonMark spec section 6.2
  • inline_test.go: added TestEntityNullByte verifying � produces U+FFFD

entity() emits a literal null byte for � instead of the Unicode
replacement character. The CommonMark spec (section 6.2) requires
U+FFFD for codepoint 0, surrogates (0xD800-0xDFFF), and values
above 0x10FFFF.

Check for all three invalid codepoint categories before converting
to a rune.
@kjk kjk merged commit c901e06 into gomarkdown:master Feb 17, 2026
1 check passed
@kjk
Copy link
Contributor

kjk commented Feb 17, 2026

thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

entity() emits null byte for � instead of U+FFFD

2 participants