Skip to content

Conversation

@enyst
Copy link
Collaborator

@enyst enyst commented Jan 20, 2026

Follow-up to #1682.

  • Allow LLM.responses() to run with stream=true when requested.
  • When LiteLLM returns a Responses streaming iterator, forward best-effort text deltas to on_token and return the completed ResponsesAPIResponse so agent execution can continue.

Tested locally by running examples/01_standalone_sdk/34_subscription_login.py in subscription mode.

enyst added 2 commits January 20, 2026 10:43
Enable streaming for LLM.responses() when requested (stream or LLM.stream).

When LiteLLM returns a Responses streaming iterator, drain it, forward best-effort text deltas to on_token (as ModelResponseStream chunks), and return the completed ResponsesAPIResponse.
Avoid getattr() by relying on LiteLLM's ResponsesAPIStreamingResponse event types (e.g., OutputTextDeltaEvent) and ResponseCompletedEvent.
Copy link
Collaborator

@all-hands-bot all-hands-bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The streaming implementation looks solid overall! I found three areas worth discussing: a redundant condition check, handling of non-streaming responses when streaming is requested, and error handling for the callback. Details in inline comments below.

Comment on lines 796 to 804
on_token(
ModelResponseStream(
choices=[
StreamingChoices(
delta=Delta(content=delta)
)
]
)
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 Suggestion: The on_token callback is invoked without error handling. If the callback raises an exception, the entire streaming response would fail and the completed response would be lost.

Consider wrapping the on_token call in a try-except block to make the system more robust:

try:
    on_token(
        ModelResponseStream(
            choices=[
                StreamingChoices(
                    delta=Delta(content=delta)
                )
            ]
        )
    )
except Exception as e:
    # Log the error but don't fail the entire request
    logger.warning(f"on_token callback failed: {e}")

This way, a faulty callback won't prevent the agent from receiving the completed response.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. For now I’m intentionally not swallowing on_token callback exceptions so behavior matches the existing Chat Completions streaming path (where callback exceptions currently propagate).

If we want “best-effort streaming” (callback failures don’t abort the request), I’d prefer to implement that consistently for both LLM.completion() and LLM.responses() (and log a warning / rate-limit), rather than making Responses special-cased.

enyst added 2 commits January 20, 2026 10:50
Avoid per-call imports in the Responses streaming path; rely on LiteLLM's typed event classes.
In subscription mode, Responses stream=True can be forced by options even when the caller didn’t request streaming, so keep gating on user_enable_streaming but avoid redundant checks inside the loop.
@enyst
Copy link
Collaborator Author

enyst commented Jan 20, 2026

It works with a Plus subscription and streaming enabled (apparently it has to be enabled)
image

enyst added 2 commits January 20, 2026 10:56
If streaming was requested but LiteLLM returns a non-streaming ResponsesAPIResponse, emit a warning so behavior is explicit.
Enable LLM stream mode and wire a simple token callback so users can see incremental output when using ChatGPT subscription login.
@openhands-ai
Copy link

openhands-ai bot commented Jan 20, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • [Optional] Docs example

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #1761 at branch `fix/responses-streaming-on-token`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@xingyaoww xingyaoww merged commit fef7d02 into feat/openai-subscription-auth Jan 20, 2026
8 of 9 checks passed
@xingyaoww xingyaoww deleted the fix/responses-streaming-on-token branch January 20, 2026 14:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants