Skip to content

[Bug]: Race condition causes UI to not show running agents and false 'agent limit reached' errors #722

@JasonBroderick

Description

@JasonBroderick

Operating System

Linux (WSL2)

Run Mode

Docker / Web

App Version

v0.14.0rc

Bug Description

When starting features (especially from Plan mode), there's a race condition that causes:

  1. UI doesn't show running agents - The board looks empty even though an agent is executing
  2. False "agent limit reached" errors - When trying to start another feature, you get "Failed to start feature, agent limit reached for worktree..." even though the UI shows nothing running
  3. State desync - The server knows an agent is running, but the UI doesn't reflect it

Root Cause Analysis

The race condition is in how runningFeatures is populated in auto-mode-service.ts.

Problem: When a feature starts, it's added to runningFeatures with branchName: null. The branchName is only populated later after several async operations. But getRunningCountForWorktree() uses branchName for counting agents per worktree.

In apps/server/src/services/auto-mode-service.ts lines 355-365:

const entry: RunningFeature = {
  featureId: params.featureId,
  projectPath: params.projectPath,
  worktreePath: null,
  branchName: null,  // <-- ALWAYS NULL AT START
  // ...
};
this.runningFeatures.set(params.featureId, entry);

The branchName is only set ~100 lines later (line 1225) after async operations:

tempRunningFeature.branchName = branchName ?? null;  // <-- SET MUCH LATER

Timeline of the race:

  1. User clicks "Start" on a feature
  2. run-feature.ts calls checkWorktreeCapacity() then executeFeature()
  3. Feature is added to runningFeatures with branchName: null
  4. Server returns { success: true } immediately (line 59 in run-feature.ts)
  5. auto_mode_feature_start event is emitted much later (line 1232) after:
    • Feature load
    • Context existence check
    • Worktree lookup
    • Status update
  6. UI doesn't invalidate queries until event arrives (~100-500ms later)
  7. User tries to start another feature
  8. Server correctly sees something IS running → "agent limit reached"
  9. But UI still shows nothing running

Steps to Reproduce

  1. Set auto-mode agent limit to 1 for main worktree
  2. Use Plan mode to add multiple features to backlog
  3. Start the first feature
  4. Observe: UI may not immediately show the feature as running
  5. Quickly try to start a second feature from the backlog
  6. Get error: "Failed to start feature, agent limit reached for worktree..."
  7. But the UI shows no running features

Expected Behavior

  • UI should immediately show features as running when started
  • Agent counting should be accurate from the moment a feature starts
  • UI state should match server state

Actual Behavior

  • UI doesn't show running features until the auto_mode_feature_start WebSocket event arrives (delayed)
  • Agent counting may be incorrect during the race window because branchName is null
  • Users see "agent limit reached" when UI shows nothing running

Recommended Fix

Option: Load branchName BEFORE adding to runningFeatures

Modify executeFeature() to load the feature's branchName before calling acquireRunningFeature():

async executeFeature(projectPath, featureId, useWorktrees, isAutoMode, options) {
  // Load feature FIRST to get branchName
  const feature = await this.loadFeature(projectPath, featureId);
  if (!feature) throw new Error(`Feature ${featureId} not found`);
  
  // Now acquire with correct branchName immediately
  const tempRunningFeature = this.acquireRunningFeature({
    featureId,
    projectPath,
    isAutoMode,
    branchName: feature.branchName ?? null,  // Pass it here
    allowReuse: options?._calledInternally,
  });
  
  // ... rest of execution
}

Files to modify:

  • apps/server/src/services/auto-mode-service.ts:
    • acquireRunningFeature() - Add branchName parameter
    • executeFeature() - Load feature before acquire, pass branchName
    • resumePipelineFromStep() - Same pattern
    • followUpFeature() - Same pattern

Also consider:

  • Emitting an early "feature_starting" event from run-feature.ts before returning success, so UI can optimistically show the feature

Screenshots

No response

Relevant Logs

# User starts feature, UI shows empty
# User tries to start another feature:
Failed to start feature, agent limit reached for worktree...

Additional Context

The getRunningCountForWorktree() function (lines 802-828) relies on branchName being set correctly:

  • Features with branchName: null are counted toward the main worktree limit
  • This means a feature intended for a feature branch gets miscounted during the race window

Related code paths that also need the fix:

  • resumePipelineFromStep()
  • followUpFeature()

Checklist

  • I have searched existing issues to ensure this bug hasn't been reported already
  • I have provided all required information above

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions