Skip to content

context.md contains garbled text (mojibake) when prompts include CJK or other multi-byte characters #419

@wasabeef

Description

@wasabeef

What happened?

When a session prompt contains CJK characters (Japanese, Chinese, Korean) or other non-ASCII multi-byte characters, the generated context.md on the
entire/checkpoints/v1 branch contains garbled text (mojibake).

Root cause: generateContextFromPrompts in manual_commit_condensation.go truncates long prompts with byte slicing (displayPrompt[:500]). Because CJK characters
are 3 bytes each in UTF-8, slicing at byte offset 500 splits multi-byte sequences at arbitrary byte boundaries, corrupting the output.

The corruption can be verified locally with:

prompt := strings.Repeat("あ", 200) // 200 CJK chars = 600 bytes
truncated := prompt[:500]           // splits mid-character → invalid UTF-8

The prompt.txt and full.jsonl files in the same checkpoint are unaffected — the bug is isolated to the truncation logic in generateContextFromPrompts.

Steps to reproduce

  1. entire enable
  2. Start a Claude Code session and send a prompt longer than ~167 CJK characters (≥ 500 bytes)
  3. Make a git commit to trigger condense
  4. Inspect entire/checkpoints/v1//0/context.md — the truncated prompt will contain garbled characters

Entire CLI version

v0.4.5

OS and architecture

Darwin 24.5.0 arm64

Agent

Claude Code

Strategy

manual-commit (default)

Terminal

WezTerm

Logs / debug output

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions