Skip to content

Conversation

@Liberatedwinner
Copy link

@Liberatedwinner Liberatedwinner commented Jan 27, 2026

Hello,
this PR fixes the action of auto_drop_analysis.

Problem

In multi-turn conversations, auto_drop_analysis only drops analysis before the first final message, leaving intermediate analysis messages.

Changes

  • first_final_idxlast_final_idx in src/encoding.rs
  • Added multi-turn test case

Verification with HuggingFace Jinja Template

Click to expand test script
"""
GPT-OSS Harmony Chat Template Test
Verify analysis(thinking) drop behavior in multi-turn conversations

Requirements:
    pip install transformers

Usage:
    python test_jinja_template_comparison.py
"""

from transformers import AutoTokenizer


def main():
    tokenizer = AutoTokenizer.from_pretrained("openai/gpt-oss-20b")

    messages = [
        {
            "role": "developer",
            "content": "You are a helpful assistant that analyzes code and provides detailed feedback.",
        },
        {
            "role": "user",
            "content": "Can you help me optimize this Python function?\n\ndef fibonacci(n):\n    if n <= 1:\n        return n\n    return fibonacci(n-1) + fibonacci(n-2)",
        },
        {
            "role": "assistant",
            "thinking": "The user provided a recursive Fibonacci implementation. O(2^n) complexity.",
            "content": "This recursive Fibonacci has exponential time complexity. Would you like me to show optimized versions?",
        },
        {"role": "user", "content": "Yes, and benchmark them"},
        {
            "role": "assistant",
            "thinking": "User wants benchmarks. I should run some Python code to compare performance.",
            "content": "I'll benchmark both versions for you.",
        },
        {"role": "user", "content": "Run the benchmark for n=30"},
        {
            "role": "assistant",
            "thinking": "I need to execute Python code to run the benchmark for n=30.",
            "tool_calls": [
                {
                    "id": "call_001",
                    "type": "function",
                    "function": {
                        "name": "python",
                        "arguments": '{"code": "import timeit\\n\\ndef fib_recursive(n):\\n    if n <= 1: return n\\n    return fib_recursive(n-1) + fib_recursive(n-2)\\n\\ndef fib_iter(n):\\n    if n <= 1: return n\\n    a, b = 0, 1\\n    for _ in range(2, n+1): a, b = b, a+b\\n    return b\\n\\nprint(timeit.timeit(lambda: fib_recursive(30), number=1))\\nprint(timeit.timeit(lambda: fib_iter(30), number=1000))"}',
                    },
                }
            ],
        },
    ]

    tools = [
        {
            "type": "function",
            "function": {
                "name": "python",
                "description": "Execute Python code",
                "parameters": {
                    "type": "object",
                    "properties": {"code": {"type": "string", "description": "Python code to execute"}},
                    "required": ["code"],
                },
            },
        }
    ]

    rendered = tokenizer.apply_chat_template(messages, tools=tools, tokenize=False, add_generation_prompt=True)
    print(rendered)

    # Expected:
    # - "O(2^n) complexity" should NOT be in output (Turn 2 analysis dropped)
    # - "I should run some Python code" should NOT be in output (Turn 4 analysis dropped)
    # - "I need to execute Python code" should be in output (Turn 6 analysis kept)

    assert "O(2^n) complexity" not in rendered
    assert "I should run some Python code" not in rendered
    assert "I need to execute Python code" in rendered
    print("All checks passed!")


if __name__ == "__main__":
    main()

Example

  • 3 assistant turns with thinking + final, last one with tool_call

As-is (bug)

<|start|>developer<|message|>You are a helpful assistant that analyzes code and provides detailed feedback.<|end|><|start|>user<|message|>Can you help me optimize this Python function?

def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)<|end|><|start|>assistant<|channel|>analysis<|message|>The user provided a recursive Fibonacci implementation. O(2^n) complexity.<|end|><|start|>assistant<|channel|>final<|message|>This recursive Fibonacci has exponential time complexity. Would you like me to show optimized versions?<|end|><|start|>user<|message|>Yes, and benchmark them<|end|><|start|>assistant<|channel|>analysis<|message|>User wants benchmarks. I should run some Python code to compare performance.<|end|><|start|>assistant<|channel|>final<|message|>I'll benchmark both versions for you.<|end|><|start|>user<|message|>Run the benchmark for n=30<|end|><|start|>assistant<|channel|>analysis<|message|>I need to execute Python code to run the benchmark for n=30.<|end|><|start|>assistant to=functions.python<|channel|>commentary<|message|>{"code": "import timeit\n\ndef fib_recursive(n):\n    if n <= 1: return n\n    return fib_recursive(n-1) + fib_recursive(n-2)\n\ndef fib_iter(n):\n    if n <= 1: return n\n    a, b = 0, 1\n    for _ in range(2, n+1): a, b = b, a+b\n    return b\n\nprint(timeit.timeit(lambda: fib_recursive(30), number=1))\nprint(timeit.timeit(lambda: fib_iter(30), number=1000))"}<|call|><|start|>assistant
  • Turn 2 analysis: kept (should be dropped)
  • Turn 4 analysis: kept (should be dropped)
  • Turn 6 analysis: kept

To-be (fix)

<|start|>developer<|message|>You are a helpful assistant that analyzes code and provides detailed feedback.<|end|><|start|>user<|message|>Can you help me optimize this Python function?

def fibonacci(n):
    if n <= 1:
        return n
    return fibonacci(n-1) + fibonacci(n-2)<|end|><|start|>assistant<|channel|>final<|message|>This recursive Fibonacci has exponential time complexity. Would you like me to show optimized versions?<|end|><|start|>user<|message|>Yes, and benchmark them<|end|><|start|>assistant<|channel|>final<|message|>I'll benchmark both versions for you.<|end|><|start|>user<|message|>Run the benchmark for n=30<|end|><|start|>assistant<|channel|>analysis<|message|>I need to execute Python code to run the benchmark for n=30.<|end|><|start|>assistant to=functions.python<|channel|>commentary<|message|>{"code": "import timeit\n\ndef fib_recursive(n):\n    if n <= 1: return n\n    return fib_recursive(n-1) + fib_recursive(n-2)\n\ndef fib_iter(n):\n    if n <= 1: return n\n    a, b = 0, 1\n    for _ in range(2, n+1): a, b = b, a+b\n    return b\n\nprint(timeit.timeit(lambda: fib_recursive(30), number=1))\nprint(timeit.timeit(lambda: fib_iter(30), number=1000))"}<|call|><|start|>assistant
  • Turn 2 analysis: dropped
  • Turn 4 analysis: dropped
  • Turn 6 analysis: kept

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant