Conversation
Byte-based string slicing in generateContextFromPrompts and TruncateDescription split multi-byte UTF-8 sequences at arbitrary byte boundaries, producing mojibake in context.md and commit messages. Replace byte slicing with rune-based truncation using the existing stringutil.TruncateRunes helper throughout. Closes #419 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Entire-Checkpoint: 3d5fd4d46a16
There was a problem hiding this comment.
Pull request overview
Fixes UTF-8 corruption (mojibake) caused by byte-based truncation when generating human-readable prompt/context summaries, by switching truncation to rune-safe logic and adding regression tests (CJK/emoji/ASCII). This aligns with the CLI’s goal of producing readable, searchable session metadata without mangling user content.
Changes:
- Replace byte slicing with rune-based truncation in
generateContextFromPrompts. - Update
TruncateDescriptionto truncate by runes (UTF-8 safe) viastringutil.TruncateRunes. - Add tests covering CJK, emoji, ASCII truncation and UTF-8 validity.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| cmd/entire/cli/strategy/messages.go | Switch description truncation to rune-safe truncation helper to prevent UTF-8 splitting. |
| cmd/entire/cli/strategy/manual_commit_condensation.go | Use rune-based truncation for prompt rendering in generated context.md output. |
| cmd/entire/cli/strategy/manual_commit_condensation_test.go | Add regression tests ensuring truncation remains valid UTF-8 for CJK/emoji/ASCII. |
| runes := []rune(s) | ||
| if len(runes) <= maxLen { | ||
| return s | ||
| } | ||
| if maxLen < 3 { |
There was a problem hiding this comment.
TruncateDescription converts s to []rune up front, but the truncated-path then calls stringutil.TruncateRunes, which converts to []rune again. This adds extra allocations and work on every call (including the non-truncation case). Consider delegating to stringutil.TruncateRunes directly (e.g., use an empty suffix when maxLen < 3) and/or using utf8.RuneCountInString to avoid allocating unless truncation is required.
| // Truncate very long prompts for readability. | ||
| // Use rune-based truncation to avoid splitting multi-byte UTF-8 characters (e.g. CJK). | ||
| displayPrompt := stringutil.TruncateRunes(prompt, 500, "...") |
There was a problem hiding this comment.
stringutil.TruncateRunes always allocates a full []rune for prompt, even when it doesn't actually need truncation. If prompts can be large, consider first checking utf8.RuneCountInString(prompt) > 500 (or similar) and only calling TruncateRunes for long prompts to avoid unnecessary allocations.
Summary
generateContextFromPromptsandTruncateDescriptionwith rune-based truncation using the existingstringutil.TruncateRuneshelperCloses #419
Test plan
TestTruncateDescriptionandTestFormatIncrementalMessagetests pass unchangedmise run fmt && mise run lint— no new issuesgo test -race ./cmd/entire/cli/strategy/— all tests pass🤖 Generated with Claude Code