Fix changelog bundle encoding characters #2565
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #2538
Problem Description
When running
docs-builder changelog bundlewith the--resolveoption, the output file contained corrupted characters like "&o0" and "o0" instead of the correct "&" and "" characters.Example command that exhibited the issue:
These test files can be found in elastic/elasticsearch#140795
Root Causes
The issue had two root causes:
Missing UTF-8 Encoding Specification: File write operations throughout the changelog service were not explicitly specifying UTF-8 encoding, leading to potential character corruption on some systems.
YAML Special Character Handling: The YamlDotNet serializer was not configured to properly quote strings containing special YAML characters (
&for anchors,*for aliases), which could cause corruption during serialization.Solutions Implemented
1. Added Explicit UTF-8 Encoding to File Operations
Modified all file write operations in the changelog service to explicitly specify UTF-8 encoding:
Files Modified:
src/services/Elastic.Changelog/Bundling/ChangelogBundlingService.csEncoding.UTF8parameter toWriteAllTextAsyncwhen writing bundle filessrc/services/Elastic.Changelog/Creation/ChangelogFileWriter.csSystem.Textusing statementEncoding.UTF8parameter toWriteAllTextAsyncwhen writing changelog filessrc/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.csEncoding.UTF8parameter to both changelog file writes (2 locations)src/services/Elastic.Changelog/Rendering/Markdown/MarkdownRendererBase.csEncoding.UTF8parameter toWriteAllTextAsyncfor markdown outputsrc/services/Elastic.Changelog/Rendering/Asciidoc/ChangelogAsciidocRenderer.csEncoding.UTF8parameter toWriteAllTextAsyncfor asciidoc output2. Configured YamlDotNet to Quote Special Characters
src/services/Elastic.Changelog/Serialization/ChangelogYamlSerialization.cs.WithQuotingNecessaryStrings()to theYamlSerializerconfiguration&,*, etc.) are properly quoted in the output3. Added Comprehensive Test
tests/Elastic.Changelog.Tests/Changelogs/BundleChangelogsTests.csBundleChangelogs_WithResolve_PreservesSpecialCharactersInUtf8&,*,<,>,") and Unicode characters (©, ®, ™, €) are preserved correctlyCode Changes Summary
Modified Files (6):
src/services/Elastic.Changelog/Bundling/ChangelogBundlingService.cssrc/services/Elastic.Changelog/Creation/ChangelogFileWriter.cssrc/services/Elastic.Changelog/GithubRelease/GitHubReleaseChangelogService.cssrc/services/Elastic.Changelog/Rendering/Markdown/MarkdownRendererBase.cssrc/services/Elastic.Changelog/Rendering/Asciidoc/ChangelogAsciidocRenderer.cssrc/services/Elastic.Changelog/Serialization/ChangelogYamlSerialization.cstests/Elastic.Changelog.Tests/Changelogs/BundleChangelogsTests.csTesting
Test Results:
BundleChangelogs_WithResolve_PreservesSpecialCharactersInUtf8passesTest Coverage:
The new test verifies:
&,*,<,>,",/,\) are preservedImpact
These fixes ensure that:
--resolveoption works correctly without character corruptionTechnical Details
Why UTF-8 Encoding Was Needed
The .NET
File.WriteAllTextAsyncmethod uses the system default encoding when no encoding is specified. On some systems, this could be an encoding other than UTF-8, leading to character corruption for special characters and Unicode symbols.Why YamlDotNet Configuration Was Needed
YAML uses
&for anchors and*for aliases. Without proper configuration, YamlDotNet may not quote strings containing these characters, potentially leading to parsing issues or corruption. The.WithQuotingNecessaryStrings()configuration ensures that strings requiring quotes (those containing special YAML characters) are automatically quoted in the output.Verification Steps
To verify the fix works:
Build the project:
Run the changelog tests:
dotnet test tests/Elastic.Changelog.Tests/Test with real data containing special characters:
Verify the output file contains properly encoded special characters and no "&o0" or "*o0" patterns.
Conclusion
The encoding issue has been comprehensively fixed by:
This ensures that changelog bundles with the
--resolveoption will correctly preserve all special characters and Unicode symbols without corruption.Generative AI disclosure
Tool(s) and model(s) used: composer-1, claude-4.5-sonnet