Fix agent CLI stuck with no timeout on stalled LLM streams #147
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #146 - Agent CLI hangs indefinitely when the LLM API stream stalls (TCP connection alive but no data flowing).
Root Cause
The
streamText()call injs/src/session/prompt.tshad no timeout configuration. When the upstream LLM API connection stalled (TCP alive, no data), the process waited indefinitely — in the reported incident, for 2 hours and 10 minutes until manual CTRL+C.The Vercel AI SDK (v6) provides a built-in
timeoutoption withchunkMs(timeout between stream chunks) andstepMs(timeout per LLM step), but these were not being used.Changes
js/src/session/prompt.ts: Addtimeout: { chunkMs, stepMs }to thestreamText()callchunkMs: 2 minutes (detects stalled streams — no new chunk within this duration aborts the call)stepMs: 10 minutes (max time for each individual LLM step)js/src/flag/flag.ts: Add configurable environment variables:AGENT_STREAM_CHUNK_TIMEOUT_MS/LINK_ASSISTANT_AGENT_STREAM_CHUNK_TIMEOUT_MS(default: 120000)AGENT_STREAM_STEP_TIMEOUT_MS/LINK_ASSISTANT_AGENT_STREAM_STEP_TIMEOUT_MS(default: 600000)js/src/cli/continuous-mode.js: Add.unref()tosetIntervalpolling loops to allow natural process exitjs/tests/stream-timeout.test.js: 6 tests for timeout configuration (defaults, env var overrides, prefix priority)docs/case-studies/issue-146/: Complete case study with timeline reconstruction, root cause analysis, and proposed solutionsHow It Prevents the Issue
When a stream stalls, the
chunkMstimeout will trigger after 2 minutes of inactivity. This causes the AI SDK to abort the stream, which surfaces as an error in the processor. The existing retry logic (forTimeoutErroradded in PR #143) will then automatically retry the request.Test Results
All 44 timeout-related tests pass (6 new + 22 existing MCP timeout + 16 timeout retry tests).
Fixes #146