-
Notifications
You must be signed in to change notification settings - Fork 186
Description
What happened?
When a session prompt contains CJK characters (Japanese, Chinese, Korean) or other non-ASCII multi-byte characters, the generated context.md on the
entire/checkpoints/v1 branch contains garbled text (mojibake).
Root cause: generateContextFromPrompts in manual_commit_condensation.go truncates long prompts with byte slicing (displayPrompt[:500]). Because CJK characters
are 3 bytes each in UTF-8, slicing at byte offset 500 splits multi-byte sequences at arbitrary byte boundaries, corrupting the output.
The corruption can be verified locally with:
prompt := strings.Repeat("あ", 200) // 200 CJK chars = 600 bytes
truncated := prompt[:500] // splits mid-character → invalid UTF-8The prompt.txt and full.jsonl files in the same checkpoint are unaffected — the bug is isolated to the truncation logic in generateContextFromPrompts.
Steps to reproduce
- entire enable
- Start a Claude Code session and send a prompt longer than ~167 CJK characters (≥ 500 bytes)
- Make a git commit to trigger condense
- Inspect entire/checkpoints/v1//0/context.md — the truncated prompt will contain garbled characters
Entire CLI version
v0.4.5
OS and architecture
Darwin 24.5.0 arm64
Agent
Claude Code
Strategy
manual-commit (default)
Terminal
WezTerm
Logs / debug output
Additional context
No response