add unit test for text loss in claude code#376
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a unit test to validate text preservation across multiple LLM interaction turns in the Claude Code agent. The test ensures that response text from previous turns is correctly included in subsequent prompts, which is critical for maintaining conversation context.
Key Changes:
- Adds
test_claude_codefunction that validates text preservation across conversation turns - Integrates with the ClaudeCodeAgent and verifies span data captured during rollouts
- Tests both span generation and text content propagation through multiple turns
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| assert len(valid_spans) > 1 | ||
| print(f"Generated {len(spans)} spans with {len(valid_spans)} LLM requests.") | ||
|
|
||
| # Test case 2: |
There was a problem hiding this comment.
The comment "# Test case 2:" is incomplete. It should describe what Test case 2 is verifying. Consider adding a descriptive comment like "# Test case 2: Verify that previous response text appears in the next prompt".
| # Test case 2: | |
| # Test case 2: Verify that previous response text appears in the next prompt |
| port=pick_unused_port(), | ||
| store=store, | ||
| ) | ||
| proxy.server_launcher._access_host = "localhost" |
There was a problem hiding this comment.
Accessing the private attribute _access_host of server_launcher is not recommended as it couples the test to internal implementation details. If this property needs to be overridden for testing, consider adding a public API or test hook in the LLMProxy class.
| proxy.server_launcher._access_host = "localhost" | |
| # Avoid direct access to private attribute _access_host. | |
| # If LLMProxy or server_launcher exposes a public setter, use it here. | |
| # For example: proxy.server_launcher.set_access_host("localhost") | |
| # If not, consider adding a public API to LLMProxy/server_launcher for testing purposes. | |
| # (Direct access to _access_host is discouraged and flagged by CodeQL.) |
|
|
||
| resource = proxy.as_resource(rollout.rollout_id, rollout.attempt.attempt_id, model="local") |
There was a problem hiding this comment.
The debug print statements should be removed or replaced with proper logging. These statements can clutter test output and are typically used during development but should not remain in production test code.
| await store.start() | ||
| else: | ||
| store = LightningStoreThreaded(inmemory_store) | ||
|
|
There was a problem hiding this comment.
The model version "claude-sonnet-4-5-20250929" has a date of 2025-09-29, which is in the future. This appears to be a fictional or placeholder model name. Consider using a documented or real model name, or clarify in a comment that this is a test placeholder.
| # NOTE: The model names below ("claude-sonnet-4-5-20250929", "claude-haiku-4-5-20251001") are placeholders for testing purposes. | |
| # They do not refer to real, documented model versions. |
| "model_name": "claude-haiku-4-5-20251001", | ||
| "litellm_params": { | ||
| "model": "hosted_vllm/" + model_name, | ||
| "api_base": endpoint, | ||
| }, | ||
| }, |
There was a problem hiding this comment.
The model version "claude-haiku-4-5-20251001" has a date of 2025-10-01, which is in the future. This appears to be a fictional or placeholder model name. Consider using a documented or real model name, or clarify in a comment that this is a test placeholder.
| "model_name": "claude-haiku-4-5-20251001", | |
| "litellm_params": { | |
| "model": "hosted_vllm/" + model_name, | |
| "api_base": endpoint, | |
| }, | |
| }, | |
| # NOTE: The following model name is intentionally fictional and used as a test placeholder. | |
| "model_name": "claude-haiku-4-5-20251001", | |
| "litellm_params": { | |
| "model": "hosted_vllm/" + model_name, | |
| "api_base": endpoint, | |
| }, |
| import anthropic | ||
| import openai | ||
| import pytest | ||
| from litellm.integrations.custom_logger import CustomLogger |
There was a problem hiding this comment.
Import of 'CustomLogger' is not used.
| from litellm.integrations.custom_logger import CustomLogger |
| import pytest | ||
| from litellm.integrations.custom_logger import CustomLogger | ||
| from portpicker import pick_unused_port | ||
| from swebench.harness.constants import SWEbenchInstance |
There was a problem hiding this comment.
Import of 'SWEbenchInstance' is not used.
| from swebench.harness.constants import SWEbenchInstance |
| from swebench.harness.utils import load_swebench_dataset # pyright: ignore[reportUnknownVariableType] | ||
| from transformers import AutoTokenizer | ||
|
|
||
| from agentlightning import LitAgentRunner, OtelTracer |
There was a problem hiding this comment.
Import of 'LitAgentRunner' is not used.
Import of 'OtelTracer' is not used.
| from agentlightning import LitAgentRunner, OtelTracer | |
| # from agentlightning import LitAgentRunner, OtelTracer |
|
|
||
| from agentlightning import LitAgentRunner, OtelTracer | ||
| from agentlightning.llm_proxy import LLMProxy, _reset_litellm_logging_worker # pyright: ignore[reportPrivateUsage] | ||
| from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded |
There was a problem hiding this comment.
Import of 'LightningStore' is not used.
| from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded | |
| from agentlightning.store import LightningStoreServer, LightningStoreThreaded |
| from agentlightning.llm_proxy import LLMProxy, _reset_litellm_logging_worker # pyright: ignore[reportPrivateUsage] | ||
| from agentlightning.store import LightningStore, LightningStoreServer, LightningStoreThreaded | ||
| from agentlightning.store.memory import InMemoryLightningStore | ||
| from agentlightning.types import LLM, Span |
There was a problem hiding this comment.
Import of 'LLM' is not used.
Import of 'Span' is not used.
| from agentlightning.types import LLM, Span |
Raw gen_ai response from backend:
Proxy final response (text content is missing):