fix(daemon): throttle OpenCode capture retries for dead server ports by proboscis · Pull Request #404 · proboscis/orch

proboscis · 2026-02-09T14:01:39Z

Summary

Implement per-run exponential backoff (10s → 60s cap) for message capture failures against dead OpenCode server endpoints
Apply stricter backoff (2x multiplier) for ECONNREFUSED errors specifically
Rate-limit and deduplicate log messages to avoid spam

Motivation

Connection refused errors from stale OpenCode server ports created high-volume log noise and unnecessary retry load when daemon monitoring attempted message capture against older runs whose servers had stopped.

Changes

`internal/daemon/daemon.go`

Extended RunState struct with capture backoff state fields:
- CaptureFailCount - tracks consecutive failures
- LastCaptureFailAt - when last failure occurred
- NextCaptureRetryAt - backoff deadline
- LastCaptureError - for log deduplication

`internal/daemon/monitor.go`

Added backoff constants: initial=10s, max=60s, factor=2x
shouldSkipCapture() - skips capture during backoff period
handleCaptureError() - sets backoff and logs first/new errors only
resetCaptureBackoff() - clears state on successful capture
calculateCaptureBackoff() - exponential backoff with ECONNREFUSED boost
isConnectionRefused() - detects connection refused via net.OpError or string

`internal/daemon/monitor_test.go`

Tests for shouldSkipCapture, calculateCaptureBackoff, handleCaptureError, resetCaptureBackoff, isConnectionRefused

Acceptance Criteria Evidence

Criterion	Evidence
Connection-refused spam reduced to bounded, periodic logs	First error logged, then debug-level only for repeats. Backoff caps at 60s.
Capture loop avoids tight retries	`shouldSkipCapture()` enforces `NextCaptureRetryAt` deadline.
Normal live OpenCode capture still works	`resetCaptureBackoff()` clears state on success; no changes to successful path.
Tests cover retry/backoff behavior	5 new test functions with subtests covering all backoff scenarios.

=== RUN   TestShouldSkipCapture
=== RUN   TestShouldSkipCapture/no_backoff_state_allows_capture
=== RUN   TestShouldSkipCapture/future_NextCaptureRetryAt_skips_capture
=== RUN   TestShouldSkipCapture/past_NextCaptureRetryAt_allows_capture
--- PASS: TestShouldSkipCapture (0.00s)
=== RUN   TestCalculateCaptureBackoff
=== RUN   TestCalculateCaptureBackoff/first_failure_uses_initial_backoff
=== RUN   TestCalculateCaptureBackoff/exponential_backoff_increases
=== RUN   TestCalculateCaptureBackoff/backoff_caps_at_max
--- PASS: TestCalculateCaptureBackoff (0.00s)
=== RUN   TestHandleCaptureError
=== RUN   TestHandleCaptureError/first_error_sets_backoff_state
=== RUN   TestHandleCaptureError/consecutive_errors_increase_fail_count
--- PASS: TestHandleCaptureError (0.00s)
=== RUN   TestResetCaptureBackoff
--- PASS: TestResetCaptureBackoff (0.00s)
=== RUN   TestIsConnectionRefused
--- PASS: TestIsConnectionRefused (0.00s)

Fixes: orch-427

Implement per-run backoff for message capture failures, especially connection refused errors from stale OpenCode server endpoints. Key changes: - Add exponential backoff (10s -> 60s cap) for capture failures - Apply stricter backoff (2x multiplier) for ECONNREFUSED specifically - Track capture failure state per-run (count, timestamps, last error) - Rate-limit/deduplicate logs (first error + different errors only) - Reset backoff on successful capture This reduces connection refused spam to bounded, periodic logs and avoids tight retry loops against known-dead endpoints. Fixes: orch-427

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(daemon): throttle OpenCode capture retries for dead server ports#404

fix(daemon): throttle OpenCode capture retries for dead server ports#404
proboscis wants to merge 1 commit intomainfrom
issue/orch-427/run-20260209-225545

proboscis commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

proboscis commented Feb 9, 2026

Summary

Motivation

Changes

internal/daemon/daemon.go

internal/daemon/monitor.go

internal/daemon/monitor_test.go

Acceptance Criteria Evidence

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

`internal/daemon/daemon.go`

`internal/daemon/monitor.go`

`internal/daemon/monitor_test.go`