Skip to content

Conversation

@shreymodi1
Copy link
Contributor

@shreymodi1 shreymodi1 commented Dec 18, 2025

Note

Adds feature-flagged reasoning and multi-tool support, introduces tool_choice (none/required) tests, and uses raw_output with conditional reasoning handling across streaming and non-streaming benchmarks.

  • Feature Flags & Infra:

    • Add env-driven flags SUPPORTS_MULTIPLE_TOOL_CALLS and SUPPORTS_REASONING with pytest.mark.skipif gating.
    • Replace ChatCompletionContentPartTextParam with ChatCompletionContentPartParam.
    • Introduce _maybe_add_reasoning_effort and pass reasoning_effort conditionally; extend passthrough to tool_choice.
    • Enable raw_output=True broadly to inspect prompts; add checks against raw_output.prompt_fragments.
  • Tests Added/Expanded:

    • Tool choice: tool_choice=none and tool_choice=required (streaming and non-streaming) ensuring no/required tool calls and prompt hygiene.
    • Reasoning + structured JSON and multiple tools (both streaming and non-streaming) with conditional reasoning assertions.
    • Multi-tool-call tests gated by capability flag for streaming and non-streaming.
  • Validation/Metrics Adjustments:

    • Normalize tool calls via helpers; improve argument validation.
    • Conditionally include reasoning metrics/checks only when supported; tighten forbidden/XML tag and leakage checks.
    • Minor parameter tweaks (e.g., adding raw_output, refining finish_reason expectations).

Written by Cursor Bugbot for commit 982eba2. This will update automatically on new commits. Configure here.

"stream": True,
"temperature": 0.0,
"max_tokens": DEFAULT_MAX_TOKENS,
"reasoning_effort": "none",
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: New tool_choice tests hardcode reasoning_effort unconditionally

The new tool_choice tests (test_streaming_tool_choice_none, test_non_streaming_tool_choice_none, test_streaming_tool_choice_required, test_non_streaming_tool_choice_required) hardcode "reasoning_effort": "none" in their completion_params. This contradicts the SUPPORTS_REASONING feature flag design, which states that when EP_SUPPORTS_REASONING=0, the reasoning_effort parameter should NOT be passed at all. These tests should use _maybe_add_reasoning_effort to conditionally include the parameter, matching the pattern used by other tests in this file.

Additional Locations (2)

Fix in Cursor Fix in Web

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants