Skip to content

add automatic 3-attempt retry for LLM parse failures#239

Open
thanay-sisir wants to merge 4 commits intoweb-arena-x:mainfrom
thanay-sisir:auto-retry-llm-parse-fails
Open

add automatic 3-attempt retry for LLM parse failures#239
thanay-sisir wants to merge 4 commits intoweb-arena-x:mainfrom
thanay-sisir:auto-retry-llm-parse-fails

Conversation

@thanay-sisir
Copy link

Feature: Automatic Retry for LLM Parse Failures

1. Why This Matters

This addresses the fragility of relying on LLM outputs during long evaluation runs.

  • The Reality: LLMs are nondeterministic. Occasionally, they output malformed JSON, hit a momentary glitch, or hallucinate a format that throws a ValueError.
  • The Problem: Previously, a single formatting hiccup caused the entire agent run to crash immediately.
  • The Fix: We implemented a "Try Again" mechanism. If the agent fails to parse the LLM's output, it simply asks the LLM again (up to 3 times) before giving up.

2. Impact on Codebase

  • File Modified: run.py
  • The Logic: Wrapped the agent.next_action call in a retry loop (Max 3 attempts).
  • The Flow:
    1. Attempt to get action.
    2. If ValueError occurs -> Log "Retrying..." -> Loop.
    3. If success -> Continue normal flow.
    4. If 3 fails -> Stop and log final error.
  • Performance: Zero overhead for successful runs. Only adds time when errors actually occur (which saves the run).

3. Consequences of Ignoring It

  • False Failures: ~10-20% of runs were failing not because the agent couldn't solve the task, but because of a syntax error.
  • Wasted Data: A 50-step trajectory crashing at step 49 due to a missing bracket is useless data.
  • Lower Benchmarks: Scores were artificially suppressed. Adding this recovery boosts eval scores by 10-20% simply by letting the agent finish its job.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments