Skip to content

Conversation

@konard
Copy link
Contributor

@konard konard commented Jan 30, 2026

Summary

Fixes #146 - Agent CLI hangs indefinitely when the LLM API stream stalls (TCP connection alive but no data flowing).

Root Cause

The streamText() call in js/src/session/prompt.ts had no timeout configuration. When the upstream LLM API connection stalled (TCP alive, no data), the process waited indefinitely — in the reported incident, for 2 hours and 10 minutes until manual CTRL+C.

The Vercel AI SDK (v6) provides a built-in timeout option with chunkMs (timeout between stream chunks) and stepMs (timeout per LLM step), but these were not being used.

Changes

  1. js/src/session/prompt.ts: Add timeout: { chunkMs, stepMs } to the streamText() call

    • chunkMs: 2 minutes (detects stalled streams — no new chunk within this duration aborts the call)
    • stepMs: 10 minutes (max time for each individual LLM step)
  2. js/src/flag/flag.ts: Add configurable environment variables:

    • AGENT_STREAM_CHUNK_TIMEOUT_MS / LINK_ASSISTANT_AGENT_STREAM_CHUNK_TIMEOUT_MS (default: 120000)
    • AGENT_STREAM_STEP_TIMEOUT_MS / LINK_ASSISTANT_AGENT_STREAM_STEP_TIMEOUT_MS (default: 600000)
  3. js/src/cli/continuous-mode.js: Add .unref() to setInterval polling loops to allow natural process exit

  4. js/tests/stream-timeout.test.js: 6 tests for timeout configuration (defaults, env var overrides, prefix priority)

  5. docs/case-studies/issue-146/: Complete case study with timeline reconstruction, root cause analysis, and proposed solutions

How It Prevents the Issue

When a stream stalls, the chunkMs timeout will trigger after 2 minutes of inactivity. This causes the AI SDK to abort the stream, which surfaces as an error in the processor. The existing retry logic (for TimeoutError added in PR #143) will then automatically retry the request.

Test Results

All 44 timeout-related tests pass (6 new + 22 existing MCP timeout + 16 timeout retry tests).

Fixes #146

Adding CLAUDE.md with task information for AI processing.
This file will be removed when the task is complete.

Issue: #146
@konard konard self-assigned this Jan 30, 2026
@konard
Copy link
Contributor Author

konard commented Jan 30, 2026

🚨 Solution Draft Failed

The automated solution draft encountered an error:

CLAUDE execution failed

📎 Failure log uploaded as Gist (790KB)
🔗 View complete failure log

Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard
Copy link
Contributor Author

konard commented Jan 30, 2026

🤖 AI Work Session Started

Starting automated work session at 2026-01-30T23:08:57.801Z

The PR has been converted to draft mode while work is in progress.

This comment marks the beginning of an AI work session. Please wait working session to finish, and provide your feedback.

konard and others added 4 commits January 31, 2026 00:15
Deep analysis of the 2h10m hang incident including:
- Complete timeline reconstruction from logs
- Root cause: missing streamText timeout configuration
- Contributing factors: no session inactivity watchdog, unref'd intervals
- Proposed solutions using AI SDK's built-in timeout options

Refs #146

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Root cause: The streamText() call had no timeout configuration, so when
the upstream LLM API connection stalled (TCP alive but no data), the
process waited indefinitely with no recovery mechanism.

Changes:
- Add chunkMs (2min) and stepMs (10min) timeouts to streamText() call
  using the AI SDK's built-in timeout option
- Add configurable env vars: AGENT_STREAM_CHUNK_TIMEOUT_MS (default 120s)
  and AGENT_STREAM_STEP_TIMEOUT_MS (default 600s)
- Add .unref() to setInterval in continuous mode to allow natural exit
- Add tests for stream timeout configuration

Fixes #146

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@konard konard changed the title [WIP] Agent CLI is stuck, there was not timeout Fix agent CLI stuck with no timeout on stalled LLM streams Jan 30, 2026
@konard konard marked this pull request as ready for review January 30, 2026 23:21
@konard
Copy link
Contributor Author

konard commented Jan 30, 2026

🤖 Solution Draft Log

This log file contains the complete execution trace of the AI solution draft process.

💰 Cost estimation:

  • Public pricing estimate: $6.762504 USD
  • Calculated by Anthropic: $6.928730 USD
  • Difference: $0.166226 (+2.46%)
    📎 Log file uploaded as Gist (1416KB)
    🔗 View complete solution draft log

Now working session is ended, feel free to review and add any feedback on the solution draft.

@konard konard merged commit 41bb5b9 into main Jan 30, 2026
8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Agent CLI is stuck, there was not timeout

2 participants