Skip to content

Conversation

@lcawl
Copy link
Contributor

@lcawl lcawl commented Jan 26, 2026

Fixes #2538

Problem Description

When running docs-builder changelog bundle with the --resolve option, the output file contained corrupted characters like "&o0" and "o0" instead of the correct "&" and "" characters.

Example command that exhibited the issue:

docs-builder changelog bundle \
--directory ~/Documents/GitHub/elasticsearch/docs/changelog/new \
--output ~/Documents/GitHub/elasticsearch/docs/release-notes/changelog-bundles/9.3.0.yaml \
--prs ~/Documents/GitHub/elasticsearch/docs/elasticsearch-9.3.0.txt \
--output-products "elasticsearch 9.3.0 ga" \
--resolve --owner elastic --repo elasticsearch

These test files can be found in elastic/elasticsearch#140795

Root Causes

The issue had two root causes:

  1. Missing UTF-8 Encoding Specification: File write operations throughout the changelog service were not explicitly specifying UTF-8 encoding, leading to potential character corruption on some systems.

  2. YAML Special Character Handling: The YamlDotNet serializer was not configured to properly quote strings containing special YAML characters (& for anchors, * for aliases), which could cause corruption during serialization.

Solutions Implemented

1. Added Explicit UTF-8 Encoding to File Operations

Modified all file write operations in the changelog service to explicitly specify UTF-8 encoding:

Files Modified:

src/services/Elastic.Changelog/Bundling/ChangelogBundlingService.cs

  • Added Encoding.UTF8 parameter to WriteAllTextAsync when writing bundle files
  • Added comment explaining the encoding requirement

src/services/Elastic.Changelog/Creation/ChangelogFileWriter.cs

  • Added System.Text using statement
  • Added Encoding.UTF8 parameter to WriteAllTextAsync when writing changelog files

src/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cs

  • Added Encoding.UTF8 parameter to both changelog file writes (2 locations)

src/services/Elastic.Changelog/Rendering/Markdown/MarkdownRendererBase.cs

  • Added Encoding.UTF8 parameter to WriteAllTextAsync for markdown output

src/services/Elastic.Changelog/Rendering/Asciidoc/ChangelogAsciidocRenderer.cs

  • Added Encoding.UTF8 parameter to WriteAllTextAsync for asciidoc output

2. Configured YamlDotNet to Quote Special Characters

src/services/Elastic.Changelog/Serialization/ChangelogYamlSerialization.cs

  • Added .WithQuotingNecessaryStrings() to the YamlSerializer configuration
  • This ensures that strings containing special YAML characters (&, *, etc.) are properly quoted in the output

3. Added Comprehensive Test

tests/Elastic.Changelog.Tests/Changelogs/BundleChangelogsTests.cs

  • Added new test: BundleChangelogs_WithResolve_PreservesSpecialCharactersInUtf8
  • Tests that special characters (&, *, <, >, ") and Unicode characters (©, ®, ™, €) are preserved correctly
  • Verifies that corruption patterns like "&o0" and "*o0" do not appear in the output

Code Changes Summary

Modified Files (6):

  1. src/services/Elastic.Changelog/Bundling/ChangelogBundlingService.cs
  2. src/services/Elastic.Changelog/Creation/ChangelogFileWriter.cs
  3. src/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cs
  4. src/services/Elastic.Changelog/Rendering/Markdown/MarkdownRendererBase.cs
  5. src/services/Elastic.Changelog/Rendering/Asciidoc/ChangelogAsciidocRenderer.cs
  6. src/services/Elastic.Changelog/Serialization/ChangelogYamlSerialization.cs
  7. tests/Elastic.Changelog.Tests/Changelogs/BundleChangelogsTests.cs

Testing

Test Results:

  • All 116 existing tests pass
  • New test BundleChangelogs_WithResolve_PreservesSpecialCharactersInUtf8 passes
  • No linting errors
  • Clean build with no warnings

Test Coverage:

The new test verifies:

  • Special characters (&, *, <, >, ", /, \) are preserved
  • Unicode characters (©, ®, ™, €, £, ¥) are preserved
  • No corruption patterns ("&o0", "*o0") appear in output
  • UTF-8 encoding is maintained throughout the process
  • Content structure remains correct

Impact

These fixes ensure that:

  1. Character encoding is consistent across all changelog file operations
  2. Special YAML characters are properly handled during serialization
  3. Unicode characters are preserved correctly
  4. The --resolve option works correctly without character corruption
  5. Cross-platform compatibility is improved by explicit encoding specification

Technical Details

Why UTF-8 Encoding Was Needed

The .NET File.WriteAllTextAsync method uses the system default encoding when no encoding is specified. On some systems, this could be an encoding other than UTF-8, leading to character corruption for special characters and Unicode symbols.

Why YamlDotNet Configuration Was Needed

YAML uses & for anchors and * for aliases. Without proper configuration, YamlDotNet may not quote strings containing these characters, potentially leading to parsing issues or corruption. The .WithQuotingNecessaryStrings() configuration ensures that strings requiring quotes (those containing special YAML characters) are automatically quoted in the output.

Verification Steps

To verify the fix works:

  1. Build the project:

    dotnet build
  2. Run the changelog tests:

    dotnet test tests/Elastic.Changelog.Tests/
  3. Test with real data containing special characters:

    docs-builder changelog bundle \
    --directory /path/to/changelogs \
    --output /path/to/output.yaml \
    --all \
    --resolve
  4. Verify the output file contains properly encoded special characters and no "&o0" or "*o0" patterns.

Conclusion

The encoding issue has been comprehensively fixed by:

  • Adding explicit UTF-8 encoding to all file write operations
  • Configuring YamlDotNet to properly quote special characters
  • Adding tests to prevent regressions

This ensures that changelog bundles with the --resolve option will correctly preserve all special characters and Unicode symbols without corruption.

Generative AI disclosure

  1. Did you use a generative AI (GenAI) tool to assist in creating this contribution?
  • Yes
  • No
  1. If you answered "Yes" to the previous question, please specify the tool(s) and model(s) used (e.g., Google Gemini, OpenAI ChatGPT-4, etc.).

Tool(s) and model(s) used: composer-1, claude-4.5-sonnet

@lcawl lcawl added the bug label Jan 26, 2026
@lcawl lcawl marked this pull request as ready for review January 26, 2026 20:28
@lcawl lcawl requested a review from a team as a code owner January 26, 2026 20:28
@lcawl lcawl requested a review from Mpdreamz January 26, 2026 20:28
@lcawl lcawl merged commit 31ee802 into main Jan 26, 2026
32 of 33 checks passed
@lcawl lcawl deleted the changelog-resolve-bug branch January 26, 2026 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Unexpected characters in resolved changelog bundle

3 participants