-
Notifications
You must be signed in to change notification settings - Fork 63
LCORE-956: Bump llama stack to 0.3.0 #866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LCORE-956: Bump llama stack to 0.3.0 #866
Conversation
WalkthroughAdds Conversations v3 endpoints; integrates conversation lifecycle (normalize/convert IDs, persist) with Responses API for query and streaming flows; introduces ToolCallSummary and ToolResultSummary, moves RAGChunk to utils, enriches SSE end-events with quotas and referenced documents; bumps llama-stack to 0.3.0 and updates configs/tests/OpenAPI. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Client
participant App as Lightspeed App
participant Llama as Llama Stack (Responses API)
participant DB as Local DB
Client->>App: POST /v1/query or /v1/streaming_query (may include conversation_id)
alt no conversation_id
App->>Llama: POST /conversations.create
Llama-->>App: { id: "conv_<hex>" }
App->>DB: store mapping (normalized id)
DB-->>App: ack
else conversation_id provided
App->>App: normalize / to_llama_stack_conversation_id
end
App->>Llama: Call Responses API (include "conversation": llama_conv_id)
alt streaming
Llama-->>App: streaming chunks (tokens, tool events...)
App->>Client: SSE chunks (START with normalized id, token/tool events...)
Note over App: compute referenced_documents & available_quotas before end
App->>Client: SSE end-event (token usage, available_quotas, referenced_documents)
else non-streaming
Llama-->>App: full response (tool_calls, tool_results, references)
App->>Client: JSON response (tool_calls, tool_results, referenced_documents, token usage, quotas)
end
App->>DB: persist transcript (includes tool_results)
DB-->>App: ack
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45–75 minutes
Possibly related PRs
Suggested labels
Suggested reviewers
Pre-merge checks and finishing touches✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
5099065 to
3d4e2c5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 14
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (2)
src/models/responses.py (1)
417-451: Updatemodel_configexample to match the newToolCallSummaryandToolResultSummarystructure.The example includes
rag_chunks(which is no longer a field) and uses the oldtool_callsformat (tool_name,arguments,result) instead of the newToolCallSummarystructure (id,name,args,type). The example should also includetool_results.model_config = { "json_schema_extra": { "examples": [ { "conversation_id": "123e4567-e89b-12d3-a456-426614174000", "response": "Operator Lifecycle Manager (OLM) helps users install...", - "rag_chunks": [ - { - "content": "OLM is a component of the Operator Framework toolkit...", - "source": "kubernetes-docs/operators.md", - "score": 0.95, - } - ], "tool_calls": [ { - "tool_name": "knowledge_search", - "arguments": {"query": "operator lifecycle manager"}, - "result": {"chunks_found": 5}, + "id": "call_123", + "name": "knowledge_search", + "args": {"query": "operator lifecycle manager"}, + "type": "tool_call", + } + ], + "tool_results": [ + { + "id": "call_123", + "status": "success", + "content": {"chunks_found": 5}, + "type": "tool_result", + "round": 1, } ], "referenced_documents": [src/utils/suid.py (1)
19-44: Aligncheck_suidtype handling with its docstring (bytes vs str)The extended
check_suidlogic forconv_...and plain hex conversation IDs looks good, and the newnormalize_conversation_id/to_llama_stack_conversation_idhelpers are straightforward and correct for theconv_<hex>vs normalized formats.However, the docstring now advertises:
suid (str | bytes)and- acceptance of a “byte representation”,
while the function is annotated as
check_suid(suid: str) -> booland, in practice, treats non‑strinputs as invalid (abytesvalue falls through touuid.UUID(suid)and is rejected withTypeError, returningFalse).To avoid confusion for callers, consider either:
- Updating the signature and implementation to genuinely support bytes, e.g.:
-def check_suid(suid: str) -> bool: +def check_suid(suid: str | bytes) -> bool: @@ - try: + try: + if isinstance(suid, bytes): + try: + suid = suid.decode("ascii") + except UnicodeDecodeError: + return False + # Accept llama-stack conversation IDs (conv_<hex> format) - if isinstance(suid, str) and suid.startswith("conv_"): + if suid.startswith("conv_"): @@ - if isinstance(suid, str) and len(suid) >= 32: + if len(suid) >= 32: @@ - uuid.UUID(suid) + uuid.UUID(str(suid))
- Or, if bytes are not needed, simplifying the docstring to mention only
strinputs.Either way, keeping type hints and documentation consistent will make this utility safer to reuse.
Also applies to: 45-98, 101-145
🧹 Nitpick comments (29)
tests/e2e/features/info.feature (1)
19-19: Exact llama-stack version assertion may be too brittleThe step now hard-codes
0.3.0rc3+rhai0. If the underlying stack moves to another rc/build of 0.3.0, this e2e will fail even though the service is healthy and still on the intended major/minor version.Consider relaxing this to:
- Match just the semantic version prefix (e.g.,
0.3.0), or- Delegate the exact expected value to a shared constant/env so it’s updated in one place when bumping the stack.
Also please confirm that the
/infoendpoint actually returns0.3.0rc3+rhai0in your target deployment before merging.pyproject.toml (1)
64-70: Track technical debt for type checking exclusions.Adding 4 more files to pyright exclusions increases the untyped surface area. The comment indicates these are deprecated v1 endpoints - consider adding a TODO with a tracking issue to either fix the type errors or remove these endpoints when v1 is fully deprecated.
src/utils/llama_stack_version.py (1)
43-60: Update docstring to reflect new behavior.The docstring states
version_info"must be parseable by semver.Version.parse", but the implementation now silently skips the check on parse failure rather than raising an exception. Update the docstring to document this behavior accurately.src/utils/types.py (2)
113-124: Consider using Literal types for status field.The
statusfield accepts any string, but based on usage it appears to only be "success" or "failure". Consider usingLiteral["success", "failure"]for better type safety.+from typing import Literal + class ToolResultSummary(BaseModel): """Model representing a result from a tool call (for tool_results list).""" id: str = Field( description="ID of the tool call/result, matches the corresponding tool call 'id'" ) - status: str = Field( + status: Literal["success", "failure"] = Field( ..., description="Status of the tool execution (e.g., 'success')" )
163-171: Address the TODO comment about theroundattribute.The comment
# clarify meaning of this attributeon line 169 indicates uncertainty about theroundfield's semantics. Either document its intended meaning or track this as a follow-up issue.Do you want me to open an issue to track clarifying the
roundattribute's meaning?src/models/responses.py (1)
28-41: Remove commented-out code instead of leaving it in the codebase.Dead code should be removed rather than commented out. Version control preserves history if these definitions are needed later.
-# class ToolCall(BaseModel): -# """Model representing a tool call made during response generation.""" - -# tool_name: str = Field(description="Name of the tool called") -# arguments: dict[str, Any] = Field(description="Arguments passed to the tool") -# result: dict[str, Any] | None = Field(None, description="Result from the tool") - - -# class ToolResult(BaseModel): -# """Model representing a tool result.""" - -# tool_name: str = Field(description="Name of the tool") -# result: dict[str, Any] = Field(description="Result from the tool") -src/app/endpoints/streaming_query_v2.py (3)
76-76: Enable or remove the commented-outPromptTooLongResponseentry.The 413 response is commented out but
PromptTooLongResponsewas added in this PR. If prompt-too-long errors can occur in this endpoint, enable the response; otherwise, remove the comment.- # 413: PromptTooLongResponse.openapi_response(), + 413: PromptTooLongResponse.openapi_response(),If enabling, also add the import:
from models.responses import ( ... PromptTooLongResponse, ... )
296-296: Fix typo: double closing parenthesis in comment.- # Perform cleanup tasks (database and cache operations)) + # Perform cleanup tasks (database and cache operations)
457-461: Remove commented-out debug code.Debug logging statements should be removed rather than left commented in production code.
response_stream = cast(AsyncIterator[OpenAIResponseObjectStream], response) - # async for chunk in response_stream: - # logger.error("Chunk: %s", chunk.model_dump_json()) - # Return the normalized conversation_id (already normalized above) - # The response_generator will emit it in the start event + # Return the normalized conversation_id (already normalized above). + # The response_generator will emit it in the start event. return response_stream, conversation_idsrc/utils/token_counter.py (1)
72-83: Input-token comment vs. implementation mismatch (system vs. user role)The comment says this block uses the same logic as
metrics.utils.update_llm_token_count_from_turn, but here thesystem_promptis wrapped asRawMessage(role="system", ...)while the metrics helper usesrole="user". This may slightly change tokenization and contradicts the “same logic” claim. Consider either aligning the role for true parity or updating the comment to reflect the intentional difference.tests/unit/utils/test_endpoints.py (1)
261-803: Deprecated agent/temp-agent tests are now skippedMarking these legacy
get_agent/get_temp_agentasync tests with@pytest.mark.skip(reason="Deprecated API test")is a pragmatic way to avoid failures against removed/changed APIs while preserving the scenarios for reference. You may eventually want to replace these with coverage for the new v2/v3 flows or group them under a custom marker (e.g.@pytest.mark.deprecated_api) to make later cleanup easier, but the current change is fine.tests/e2e/features/conversations.feature (1)
173-181: Non-existent DELETE now returns 200 with explanatory payloadThe DELETE
/conversations/{conversation_id}scenario for a non-existent ID now expects a 200 with{"success": true, "response": "Conversation cannot be deleted"}(ignoringconversation_id). This matches the new idempotent delete behavior but is a semantic shift from the earlier 404. Please ensure client logic and API docs are updated to rely on the response body/message for “not found” vs. “deleted” distinctions rather than the status code alone.docs/conversations_api.md (1)
14-31: Fix markdownlint issues: list indentation and fenced code languagesThe doc reads well, but the markdownlint findings are valid and may break CI:
- MD007: nested list items in the TOC (Lines ~16–27) are indented with three spaces instead of two.
- MD040: several fenced code blocks lack an explicit language (e.g., the
conv_...example, SSE snippets, error snippets).You can address both with a small style-only patch, for example:
- * [Llama Stack Format](#llama-stack-format) - * [Normalized Format](#normalized-format) - * [ID Conversion Utilities](#id-conversion-utilities) + * [Llama Stack Format](#llama-stack-format) + * [Normalized Format](#normalized-format) + * [ID Conversion Utilities](#id-conversion-utilities) @@ -``` +```text conv_<48-character-hex-string>@@
-+text
conv_0d21ba731f21f798dc9680125d5d6f493e4a7ab79f25670e@@ -``` +```text data: {"event": "start", "data": {"conversation_id": "0d21ba7..."}} @@ -``` +```text Error: Conversation not found (HTTP 404)Apply the same `text` (or another appropriate language like `bash`, `json`, or `sql`) to the remaining unlabeled fenced blocks. Also applies to: 58-80, 234-245, 369-379, 469-471 </blockquote></details> <details> <summary>tests/unit/models/responses/test_successful_responses.py (1)</summary><blockquote> `41-45`: **Strengthening QueryResponse and StreamingQueryResponse tests** The switch to `ToolCallSummary` / `ToolResultSummary` and the new `TestStreamingQueryResponse` look consistent with the updated response models: - `test_constructor_minimal` correctly expects `tool_calls` and `tool_results` to default to `None`. - `TestStreamingQueryResponse` validates the SSE `text/event-stream` OpenAPI shape and example presence exactly as implemented. One small improvement: in `test_constructor_full`, you already construct `tool_results` and pass them into `QueryResponse`, but never assert they are preserved. Adding a single check would tighten coverage on the new field: ```diff @@ - response = QueryResponse( # type: ignore[call-arg] + response = QueryResponse( # type: ignore[call-arg] @@ - assert response.conversation_id == "conv-123" - assert response.tool_calls == tool_calls + assert response.conversation_id == "conv-123" + assert response.tool_calls == tool_calls + assert response.tool_results == tool_results assert response.referenced_documents == referenced_docsAlso applies to: 262-317, 968-996
tests/integration/endpoints/test_query_v2_integration.py (1)
84-90: Integration tests correctly exercise ID normalization, tool calls, and transcript handling
- Mocking
conversations.createto returnconv_+ 48 hex chars and assertingresponse.conversation_id == "a" * 48gives good end‑to‑end coverage of the new normalization flow.- The updated tool‑call tests now operate on structured
resultsand asserttool_calls[0].name/ aggregatedtool_names, matching the new ToolCallSummary shape.- In the transcript behavior test, introducing
mockerand patchingstore_transcriptavoids real file I/O while still verifying that conversations are persisted regardless of transcript configuration.You might optionally also assert
len(response.conversation_id) == 48in the normalization checks to make the invariant explicit, but the current tests are already solid.Also applies to: 166-170, 346-355, 375-378, 508-510, 1198-1227
tests/unit/app/endpoints/test_query_v2.py (1)
108-146: Unit tests comprehensively cover query v2 behavior; tighten one referenced-docs testOverall, these unit tests give strong coverage of the new query v2 behavior:
retrieve_response_*tests now mockconversations.createand assert that conversation IDs are normalized (e.g.,"conv_abc123def456"→"abc123def456"), exercise RAG/MCP tool preparation, shields/guardrails wiring, and token‑usage extraction.- Output parsing tests correctly handle content parts with
annotationslists and ensure tool calls are surfaced asToolCallSummaryinstances.- The new
test_retrieve_response_parses_referenced_documentsnicely validates URL citations, file citations, andfile_search_callresults feeding intoreferenced_docs.One small consistency improvement: in
test_retrieve_response_parses_referenced_documentsyou don’t mockconversations.create, so the returnedconv_iddepends on AsyncMock’s default behavior and is unused. For clarity and future robustness (if the helper ever starts validating or persisting that ID), you could align it with the other tests:@@ - mock_client = mocker.AsyncMock() + mock_client = mocker.AsyncMock() @@ - mock_client.responses.create = mocker.AsyncMock(return_value=response_obj) + mock_client.responses.create = mocker.AsyncMock(return_value=response_obj) + # Ensure deterministic conversation_id normalization + mock_conversation = mocker.Mock() + mock_conversation.id = "conv_abc123def456" + mock_client.conversations.create = mocker.AsyncMock(return_value=mock_conversation) @@ - _summary, _conv_id, referenced_docs, _token_usage = await retrieve_response( + _summary, conv_id, referenced_docs, _token_usage = await retrieve_response( mock_client, "model-docs", qr, token="tkn", provider_id="test-provider" ) + + assert conv_id == "abc123def456"This keeps the conversation‑ID behavior under test consistent with the rest of the suite without changing the core assertions around referenced documents.
Also applies to: 151-199, 209-265, 268-399, 400-444, 468-475, 535-579, 582-629, 676-775, 779-860
src/app/endpoints/query.py (2)
197-201: Consider removing commented-out code.The
# toolgroups=None,comment on line 201 and commented# documents=documentson line 754 appear to be temporary placeholders. If these parameters are intentionally removed from the API call, the comments should be cleaned up. If they represent pending work, consider adding a TODO with a ticket reference.
742-757: Clarify status of commented-out code blocks.Multiple code sections are commented out with no clear explanation:
- Lines 742-749:
documentsconstruction- Lines 754:
# documents=documents- Lines 756:
# toolgroups=toolgroupsIf this represents intentional API changes in llama-stack 0.3.0, consider adding brief comments explaining why. If temporary, add TODOs with ticket references.
tests/unit/app/endpoints/test_streaming_query.py (1)
46-144: Mock classes provide backward compatibility for deprecated Agent API tests.These mock classes (
TextDelta,ToolCallDelta,TurnResponseEvent, etc.) emulate the Agent API types that no longer exist in llama-stack-client 0.3.x. The docstrings clearly indicate their purpose for backward compatibility.Consider consolidating these mocks into a shared test utilities module if they're needed across multiple test files, to reduce duplication.
src/utils/endpoints.py (2)
313-316: Clarify intent of commented-out agent retrieval code.Multiple sections of agent retrieval and session management code are commented out with
...ellipsis placeholders:
- Lines 313-315: Agent retrieval
- Lines 335-342: Orphan agent deletion and session retrieval
If this represents a temporary state during the llama-stack 0.3.0 migration, add a TODO comment with a ticket reference. If these features are intentionally removed, the dead code should be cleaned up.
if conversation_id: with suppress(ValueError): - # agent_response = await client.agents.retrieve(agent_id=conversation_id) - # existing_agent_id = agent_response.agent_id - ... + # TODO(LCORE-XXX): Agent retrieval disabled during llama-stack 0.3.0 migration + passAlso applies to: 335-342
323-329: Shield parameters commented out with type:ignore.The
input_shieldsandoutput_shieldsparameters are commented out with# type: ignore[call-arg]annotations. If shields are intentionally disabled in llama-stack 0.3.0, add a clear comment explaining why. Theenable_session_persistenceparameter has a similar pattern.src/app/endpoints/conversations_v3.py (1)
47-48: Logger name should use__name__pattern; router tag inconsistent with filename.Per coding guidelines, use
logger = logging.getLogger(__name__)for module logging. Additionally, the router tag"conversations_v1"is inconsistent with the filenameconversations_v3.py, which may cause confusion in API documentation.-logger = logging.getLogger("app.endpoints.handlers") -router = APIRouter(tags=["conversations_v1"]) +logger = logging.getLogger(__name__) +router = APIRouter(tags=["conversations_v3"])tests/unit/app/endpoints/test_query.py (1)
609-609: Consider tracking or removing deprecated tests.Multiple tests are marked as skipped with "Deprecated API test". While this is acceptable during migration, consider:
- Creating an issue to track removal of these deprecated tests
- Converting them to test the new Responses API equivalents
- Removing them if the functionality is covered elsewhere
Leaving many skipped tests long-term can obscure coverage metrics and create maintenance burden.
Would you like me to open an issue to track the cleanup of these deprecated tests?
src/app/endpoints/streaming_query.py (2)
363-378: Improve error logging message in stream_http_error.
logger.exception(error)logs the error object as the message. While this works, it's more conventional to provide a descriptive message string:async def stream_http_error(error: AbstractErrorResponse) -> AsyncGenerator[str, None]: ... logger.error("Error while obtaining answer for user question") - logger.exception(error) + logger.exception("Streaming error: %s", type(error).__name__) yield format_stream_data({"event": "error", "data": {**error.detail.model_dump()}})
1110-1116: Commented-out parameters need clarification or cleanup.The
documentsparameter has a TODO reference (LCORE-881), but thetoolgroupsparameter is also commented out without explanation. Either add a tracking reference or remove if intentionally disabled.response = await agent.create_turn( messages=[UserMessage(role="user", content=query_request.query).model_dump()], session_id=session_id, - # documents=documents, + # TODO: LCORE-881 - documents temporarily disabled + # documents=documents, stream=True, - # toolgroups=toolgroups, + # TODO: Add ticket reference for toolgroups disabled status + # toolgroups=toolgroups, )src/app/endpoints/query_v2.py (1)
390-390: Consider reducing log verbosity for response object.Logging the entire
responseobject at INFO level could produce large log entries and impact performance. Consider using DEBUG level or logging only summary information:- logger.info("Response: %s", response) + logger.debug("Response object: %s", response)docs/openapi.json (3)
3431-3442: Minor docs polish: consistent error examples and typos
- Ensure example labels/messages are consistent (e.g., “cannot be deleted” vs “can not be deleted” elsewhere).
- Typo “attatchment” appears in validation examples; prefer “attachment”.
No schema change needed; search/replace in example texts is sufficient.
Also applies to: 3451-3470, 3479-3492, 3497-3511, 3526-3539, 3542-3551
6990-7014: ToolCallSummary: consider making args optional-but-documented and align example usageargs is optional in schema but not shown in examples anywhere else. Add a minimal example or clarify default (empty object) to improve client codegen hints. Also ensure all endpoint examples use name/args/id consistently.
1517-1525: SSE schema representation — optional enhancementOpenAPI can’t fully model SSE; consider linking to an explicit event schema (start/token/turn_complete/end) in description for stronger client expectations.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (40)
docs/conversations_api.md(1 hunks)docs/openapi.json(11 hunks)pyproject.toml(2 hunks)run.yaml(1 hunks)src/app/endpoints/conversations_v2.py(0 hunks)src/app/endpoints/conversations_v3.py(1 hunks)src/app/endpoints/query.py(9 hunks)src/app/endpoints/query_v2.py(16 hunks)src/app/endpoints/streaming_query.py(19 hunks)src/app/endpoints/streaming_query_v2.py(14 hunks)src/app/routers.py(2 hunks)src/constants.py(1 hunks)src/metrics/utils.py(1 hunks)src/models/requests.py(1 hunks)src/models/responses.py(7 hunks)src/utils/endpoints.py(5 hunks)src/utils/llama_stack_version.py(2 hunks)src/utils/suid.py(1 hunks)src/utils/token_counter.py(1 hunks)src/utils/transcripts.py(1 hunks)src/utils/types.py(3 hunks)test.containerfile(1 hunks)tests/configuration/minimal-stack.yaml(1 hunks)tests/e2e/configs/run-ci.yaml(1 hunks)tests/e2e/features/conversations.feature(1 hunks)tests/e2e/features/info.feature(1 hunks)tests/integration/endpoints/test_query_v2_integration.py(8 hunks)tests/integration/test_openapi_json.py(4 hunks)tests/unit/app/endpoints/test_conversations_v2.py(1 hunks)tests/unit/app/endpoints/test_query.py(30 hunks)tests/unit/app/endpoints/test_query_v2.py(27 hunks)tests/unit/app/endpoints/test_streaming_query.py(28 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(13 hunks)tests/unit/app/test_routers.py(3 hunks)tests/unit/models/responses/test_error_responses.py(3 hunks)tests/unit/models/responses/test_query_response.py(3 hunks)tests/unit/models/responses/test_rag_chunk.py(1 hunks)tests/unit/models/responses/test_successful_responses.py(6 hunks)tests/unit/utils/test_endpoints.py(11 hunks)tests/unit/utils/test_transcripts.py(3 hunks)
💤 Files with no reviewable changes (1)
- src/app/endpoints/conversations_v2.py
🧰 Additional context used
📓 Path-based instructions (9)
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (CLAUDE.md)
Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Files:
tests/e2e/features/conversations.featuretests/e2e/features/info.feature
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/utils/suid.pysrc/utils/llama_stack_version.pysrc/metrics/utils.pysrc/utils/transcripts.pysrc/utils/token_counter.pysrc/app/endpoints/conversations_v3.pysrc/constants.pysrc/utils/types.pysrc/utils/endpoints.pysrc/app/routers.pysrc/models/responses.pysrc/models/requests.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/utils/test_endpoints.pytests/unit/models/responses/test_successful_responses.pytests/unit/models/responses/test_error_responses.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/models/responses/test_rag_chunk.pytests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/test_routers.pytests/integration/test_openapi_json.pytests/unit/utils/test_transcripts.pytests/integration/endpoints/test_query_v2_integration.pytests/unit/models/responses/test_query_response.pytests/unit/app/endpoints/test_query.py
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/unit/app/endpoints/test_conversations_v2.pytests/unit/utils/test_endpoints.pytests/unit/models/responses/test_successful_responses.pytests/unit/models/responses/test_error_responses.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/models/responses/test_rag_chunk.pytests/unit/app/endpoints/test_query_v2.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/app/test_routers.pytests/integration/test_openapi_json.pytests/unit/utils/test_transcripts.pytests/integration/endpoints/test_query_v2_integration.pytests/unit/models/responses/test_query_response.pytests/unit/app/endpoints/test_query.py
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
pyproject.toml: Configure pylint withsource-roots = "src"
Excludesrc/auth/k8s.pyfrom pyright type checking
Files:
pyproject.toml
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/conversations_v3.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/conversations_v3.pysrc/app/endpoints/query_v2.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/streaming_query_v2.pysrc/app/endpoints/query.py
src/**/constants.py
📄 CodeRabbit inference engine (CLAUDE.md)
Define shared constants in central
constants.pyfile with descriptive comments
Files:
src/constants.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/responses.pysrc/models/requests.py
🧠 Learnings (7)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to pyproject.toml : Exclude `src/auth/k8s.py` from pyright type checking
Applied to files:
pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/**/*.py : Use `pytest-mock` with AsyncMock objects for mocking in tests
Applied to files:
tests/unit/utils/test_endpoints.pytests/unit/app/endpoints/test_streaming_query.pytests/unit/app/endpoints/test_query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/{unit,integration}/**/*.py : Use pytest for all unit and integration tests; do not use unittest framework
Applied to files:
tests/unit/utils/test_endpoints.py
📚 Learning: 2025-08-06T06:02:21.060Z
Learnt from: eranco74
Repo: lightspeed-core/lightspeed-stack PR: 348
File: src/utils/endpoints.py:91-94
Timestamp: 2025-08-06T06:02:21.060Z
Learning: The direct assignment to `agent._agent_id` in `src/utils/endpoints.py` is a necessary workaround for the missing agent rehydration feature in the LLS client SDK. This allows preserving conversation IDs when handling existing agents.
Applied to files:
tests/unit/utils/test_endpoints.pysrc/utils/endpoints.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Applied to files:
src/metrics/utils.pysrc/app/endpoints/streaming_query.pytests/unit/app/endpoints/test_query.pysrc/app/endpoints/query.py
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.
Applied to files:
test.containerfile
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling
Applied to files:
tests/unit/app/endpoints/test_query.py
🧬 Code graph analysis (16)
tests/unit/app/endpoints/test_conversations_v2.py (4)
tests/unit/app/endpoints/test_tools.py (1)
mock_configuration(28-47)src/cache/postgres_cache.py (1)
delete(312-347)src/cache/sqlite_cache.py (1)
delete(310-343)src/app/endpoints/conversations_v2.py (1)
delete_conversation_endpoint_handler(145-168)
tests/unit/models/responses/test_successful_responses.py (2)
src/models/responses.py (5)
StreamingQueryResponse(454-519)openapi_response(47-60)openapi_response(458-480)openapi_response(858-882)openapi_response(1187-1210)src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
src/utils/endpoints.py (3)
src/models/responses.py (2)
model(1624-1628)NotFoundResponse(1431-1486)src/utils/types.py (2)
GraniteToolParser(60-99)get_parser(82-99)src/utils/suid.py (1)
get_suid(6-16)
tests/unit/models/responses/test_error_responses.py (1)
src/models/responses.py (7)
PromptTooLongResponse(1489-1518)AbstractErrorResponse(1157-1210)DetailModel(1150-1154)openapi_response(47-60)openapi_response(458-480)openapi_response(858-882)openapi_response(1187-1210)
tests/unit/app/endpoints/test_streaming_query.py (3)
src/models/responses.py (2)
ReferencedDocument(328-338)model(1624-1628)src/models/requests.py (1)
QueryRequest(73-233)src/app/endpoints/streaming_query.py (2)
streaming_query_endpoint_handler(961-995)stream_end_event(148-197)
tests/unit/models/responses/test_rag_chunk.py (1)
src/utils/types.py (1)
RAGChunk(127-132)
tests/unit/app/endpoints/test_query_v2.py (2)
src/app/endpoints/streaming_query.py (1)
retrieve_response(998-1119)src/app/endpoints/streaming_query_v2.py (1)
retrieve_response(364-461)
tests/unit/app/endpoints/test_streaming_query_v2.py (4)
src/app/endpoints/streaming_query_v2.py (1)
streaming_query_endpoint_handler_v2(327-361)tests/unit/app/endpoints/test_query.py (1)
dummy_request(59-68)tests/unit/app/endpoints/test_query_v2.py (1)
dummy_request(31-34)src/models/responses.py (1)
model(1624-1628)
src/app/routers.py (1)
tests/unit/app/test_routers.py (1)
include_router(36-51)
tests/unit/utils/test_transcripts.py (1)
src/utils/types.py (3)
ToolCallSummary(102-110)ToolResultSummary(113-124)TurnSummary(135-220)
src/models/responses.py (1)
src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
src/app/endpoints/query_v2.py (4)
src/utils/suid.py (2)
normalize_conversation_id(101-122)to_llama_stack_conversation_id(125-145)src/utils/responses.py (1)
extract_text_from_response_output_item(6-56)src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)src/models/responses.py (2)
conversation(1383-1392)ReferencedDocument(328-338)
src/app/endpoints/streaming_query.py (2)
src/app/endpoints/query.py (1)
parse_referenced_documents(605-632)src/utils/types.py (1)
content_to_str(20-39)
tests/unit/models/responses/test_query_response.py (2)
src/models/responses.py (2)
QueryResponse(341-451)ReferencedDocument(328-338)src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
tests/unit/app/endpoints/test_query.py (2)
tests/unit/app/endpoints/test_streaming_query.py (2)
ToolExecutionStep(128-133)ToolResponse(136-141)src/utils/types.py (1)
TurnSummary(135-220)
src/app/endpoints/query.py (2)
src/models/responses.py (7)
PromptTooLongResponse(1489-1518)openapi_response(47-60)openapi_response(458-480)openapi_response(858-882)openapi_response(1187-1210)QuotaExceededResponse(1563-1643)model(1624-1628)src/utils/types.py (2)
TurnSummary(135-220)content_to_str(20-39)
🪛 markdownlint-cli2 (0.18.1)
docs/conversations_api.md
16-16: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
17-17: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
18-18: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
20-20: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
24-24: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
25-25: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
26-26: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
27-27: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
58-58: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
63-63: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
77-77: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
234-234: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
369-369: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
469-469: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: e2e_tests (ci)
11d5075 to
d4bda49
Compare
bfa516e to
0a41992
Compare
Implements the function parse_referenced_documents_from_responses_api checking at the Response API output at: - file_search_call objects (filename and attributes) - annotations within messages content (type, url, title) - 2 type of annoations, url_citation and file_citation
0a41992 to
5e39461
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (5)
src/app/endpoints/query_v2.py (1)
72-221: AlignToolCallSummary.typewith underlyingoutput_item.type(especially formcp_approval_request)The overall
_build_tool_call_summaryimplementation looks solid, and the newToolResultSummarywiring forfile_search_callandmcp_callis clear. One inconsistency stands out:
- For
mcp_approval_requestyou settype="tool_call"even thoughitem_typeis"mcp_approval_request", while other branches (file_search_call,web_search_call,mcp_call,mcp_list_tools) propagate their own specific types. This can easily confuse downstream consumers that branch ontype.Consider aligning this with the other branches:
- return ( - ToolCallSummary( - id=str(getattr(output_item, "id")), - name=getattr(output_item, "name", "mcp_approval_request"), - args=args, - type="tool_call", - ), - None, - ) + return ( + ToolCallSummary( + id=str(getattr(output_item, "id")), + name=getattr(output_item, "name", "mcp_approval_request"), + args=args, + type="mcp_approval_request", + ), + None, + )tests/e2e/features/steps/info.py (1)
15-19: Fix invalid f‑string quoting (syntax error in this module)Several assertions use double‑quoted f‑strings with
response_json["..."]inside them, e.g.:assert response_json["name"] == service_name, f"name is {response_json["name"]}"This is invalid Python syntax (unbalanced quotes inside the f‑string) and will prevent the test module from importing.
Use either single quotes for the f‑string or for the JSON keys, for example:
- assert response_json["name"] == service_name, f"name is {response_json["name"]}" + assert response_json["name"] == service_name, f'name is {response_json["name"]}'Apply the same pattern to the other f‑strings that interpolate JSON fields (service_version, model fields, shield fields, etc.) to restore syntactic correctness.
Also applies to: 70-87, 116-121, 165-180
tests/unit/app/endpoints/test_query.py (1)
2222-2369: Async topic‑summary default test is missing@pytest.mark.asyncio
test_query_endpoint_generate_topic_summary_default_trueis defined asasync defbut lacks the@pytest.mark.asynciodecorator that all other async tests in this file use. Depending on your pytest‑asyncio configuration, this test may not run or may error as a bare coroutine.Add the marker to ensure it executes:
-async def test_query_endpoint_generate_topic_summary_default_true( +@pytest.mark.asyncio +async def test_query_endpoint_generate_topic_summary_default_true( mocker: MockerFixture, dummy_request: Request ) -> None:src/models/responses.py (1)
341-451: QueryResponse docstring and examples are out of sync with new fields
QueryResponsenow exposestool_calls: list[ToolCallSummary] | Noneandtool_results: list[ToolResultSummary] | None, without arag_chunksfield. However:
- The docstring still documents
rag_chunksand describestool_callsin terms of the old ToolCall model.model_config["json_schema_extra"]["examples"]still includesrag_chunksand atool_callsexample using{"tool_name": ..., "arguments": ..., "result": ...}instead of the newToolCallSummary/ToolResultSummaryshapes.This will produce misleading OpenAPI examples and confuse clients.
Recommend:
- Updating the docstring to describe
tool_calls/tool_resultsand removingrag_chunksfrom the documented attributes.- Refreshing the example payload to:
- Drop
rag_chunks, or, if you still want to expose RAG chunks, model them viaToolResultSummary/RAGChunkand add corresponding fields.- Use the new
ToolCallSummary/ToolResultSummarystructures in the example.src/app/endpoints/streaming_query.py (1)
148-197: Update docstring to match current function signature.The docstring at line 164 references a
summary (TurnSummary)parameter that no longer exists in the function signature. The function now acceptsavailable_quotasandreferenced_documentsparameters instead.Apply this diff to update the docstring:
def stream_end_event( metadata_map: dict, token_usage: TokenCounter, available_quotas: dict[str, int], referenced_documents: list[ReferencedDocument], media_type: str = MEDIA_TYPE_JSON, ) -> str: """ Yield the end of the data stream. Format and return the end event for a streaming response, including referenced document metadata and token usage information. Parameters: metadata_map (dict): A mapping containing metadata about referenced documents. - summary (TurnSummary): Summary of the conversation turn. token_usage (TokenCounter): Token usage information. + available_quotas (dict[str, int]): Available quota information by limiter name. + referenced_documents (list[ReferencedDocument]): List of referenced documents. media_type (str): The media type for the response format. Returns: str: A Server-Sent Events (SSE) formatted string representing the end of the data stream. """
♻️ Duplicate comments (8)
src/utils/endpoints.py (1)
340-356: Critical: session_id remains undefined in existing agent path.The previous review comment identified that
session_idis never assigned whenexisting_agent_id and conversation_idis True (lines 340-342 contain only...). This will causeUnboundLocalErrorat line 356.This issue must be fixed before merging:
try: - # session_id = str(sessions_response.data[0]["session_id"]) - ... + # TODO(LCORE-XXX): Session retrieval disabled during migration + # Create a new session for existing conversations as workaround + session_id = await agent.create_session(get_suid()) except IndexError as e:pyproject.toml (1)
31-32: Reconfirmllama-stack-client==0.3.0pin vs latest stable clientThe bump to
llama-stack==0.3.0andllama-stack-client==0.3.0is consistent, but a previous review already suggested moving the client to0.3.3(latest stable at that time) for bugfixes and better security posture.If you intentionally stay on
0.3.0(e.g., due to compatibility constraints), consider documenting that in the PR or a comment. Otherwise, updating the client pin and regenerating your lockfile would be advisable.src/app/endpoints/conversations_v3.py (2)
260-377:APIStatusErrorhandling should not always map to 404 Not FoundBoth in the Conversations API call (lines 332–339) and the
except APIStatusErrorblock you currently assume anyAPIStatusErrorimplies “conversation not found” and convert it to a 404. This will incorrectly hide other backend failures (401/403/5xx) as 404s, making operational debugging harder and returning misleading responses to clients.Consider inspecting the status code on the exception (e.g.,
e.status_codeor similar) and:
- For 404: return the current
NotFoundResponse.- For other codes: log as a generic Llama Stack error and return an
InternalServerErrorResponse.generic()or a more appropriate mapping (e.g., mirror 401/403).
523-633: Update handler also maps allAPIStatusErrorto 404
update_conversation_endpoint_handlercurrently turns anyAPIStatusErrorinto aNotFoundResponse. As with the GET handler, this risks masking authentication/authorization or server errors as 404s.Refine this to:
- Map 404 to
NotFoundResponse(conversation truly missing).- Map other statuses to an internal‑error response, or propagate the HTTP status if you want parity with Llama Stack.
docs/openapi.json (4)
6325-6363: QueryResponse: remove rag_chunks from docs and update example/description to match tool_ schema.*
- Schema now exposes tool_calls/tool_results; description and example still reference rag_chunks and legacy tool_calls.
- Align description and example to prevent SDK drift.
"title": "QueryResponse", - "description": "Model representing LLM response to a query.\n\nAttributes:\n conversation_id: The optional conversation ID (UUID).\n response: The response.\n rag_chunks: List of RAG chunks used to generate the response.\n referenced_documents: The URLs and titles for the documents used to generate the response.\n tool_calls: List of tool calls made during response generation.\n truncated: Whether conversation history was truncated.\n input_tokens: Number of tokens sent to LLM.\n output_tokens: Number of tokens received from LLM.\n available_quotas: Quota available as measured by all configured quota limiters.", + "description": "Model representing LLM response to a query.\n\nAttributes:\n conversation_id: The optional conversation ID (UUID).\n response: The response.\n referenced_documents: Documents referenced in generating the response.\n tool_calls: List of tool calls made during response generation.\n tool_results: List of tool call results.\n truncated: Whether conversation history was truncated.\n input_tokens: Number of tokens sent to LLM.\n output_tokens: Number of tokens received from LLM.\n available_quotas: Quota available as measured by all configured quota limiters.", @@ - "rag_chunks": [ - { "content": "...", "score": 0.95, "source": "kubernetes-docs/operators.md" } - ], "referenced_documents": [ { "doc_title": "Operator Lifecycle Manager (OLM)", "doc_url": "https://docs.openshift.com/container-platform/4.15/operators/olm/index.html" } ], "response": "Operator Lifecycle Manager (OLM) helps users install...", - "tool_calls": [ - { - "arguments": { "query": "operator lifecycle manager" }, - "result": { "chunks_found": 5 }, - "tool_name": "knowledge_search" - } - ], + "tool_calls": [ + { "id": "call_1", "name": "knowledge_search", "args": { "query": "operator lifecycle manager" }, "type": "tool_call" } + ], + "tool_results": [ + { "id": "call_1", "status": "success", "content": {"chunks_found": 5}, "type": "tool_result", "round": 0 } + ], "truncated": false
1189-1194: Stabilize v1 operationId and align 200 example with new schema (no rag_chunks; new tool_ shapes).*
- Keep public opId v1-only; drop internal “v2_” leakage.
- 200 example still shows rag_chunks and legacy tool_calls shape; replace with ToolCallSummary/ToolResultSummary and remove rag_chunks.
- "operationId": "query_endpoint_handler_v2_v1_query_post", + "operationId": "query_endpoint_handler_v1_query_post", @@ - "example": { + "example": { "available_quotas": { "daily": 1000, "monthly": 50000 }, "conversation_id": "123e4567-e89b-12d3-a456-426614174000", "input_tokens": 150, "output_tokens": 75, - "rag_chunks": [ - { - "content": "OLM is a component of the Operator Framework toolkit...", - "score": 0.95, - "source": "kubernetes-docs/operators.md" - } - ], "referenced_documents": [ { "doc_title": "Operator Lifecycle Manager (OLM)", "doc_url": "https://docs.openshift.com/container-platform/4.15/operators/olm/index.html" } ], "response": "Operator Lifecycle Manager (OLM) helps users install...", - "tool_calls": [ - { - "arguments": { - "query": "operator lifecycle manager" - }, - "result": { - "chunks_found": 5 - }, - "tool_name": "knowledge_search" - } - ], + "tool_calls": [ + { "id": "call_1", "name": "knowledge_search", "args": { "query": "operator lifecycle manager" }, "type": "tool_call" } + ], + "tool_results": [ + { "id": "call_1", "status": "success", "content": {"chunks_found": 5}, "type": "tool_result", "round": 0 } + ], "truncated": false }Also applies to: 1212-1246
1500-1505: Streaming: fix v1 operationId, update SSE example, and declare SSE headers.
- v1 opId should not leak impl “v2”.
- SSE example: remove rag_chunks; use boolean truncated; include tool_calls/tool_results; keep example minimal.
- Add standard SSE headers to guide clients.
- "operationId": "streaming_query_endpoint_handler_v2_v1_streaming_query_post", + "operationId": "streaming_query_endpoint_handler_v1_streaming_query_post", @@ - "description": "Successful response", + "description": "Successful response", "content": { "text/event-stream": { "schema": { "type": "string", "format": "text/event-stream" }, + "headers": { + "Cache-Control": { "schema": { "type": "string" }, "description": "no-cache" }, + "Connection": { "schema": { "type": "string" }, "description": "keep-alive" } + }, - "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"No Violation\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 1, \"token\": \"\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 2, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 3, \"token\": \"!\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 4, \"token\": \" How\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 5, \"token\": \" can\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 6, \"token\": \" I\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 7, \"token\": \" assist\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 8, \"token\": \" you\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 9, \"token\": \" today\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 10, \"token\": \"?\"}}\n\ndata: {\"event\": \"turn_complete\", \"data\": {\"token\": \"Hello! How can I assist you today?\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"rag_chunks\": [], \"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 11, \"output_tokens\": 19, \"available_quotas\": {}}}\n\n" + "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"turn_complete\", \"data\": {\"token\": \"Hello! How can I assist you today?\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": false, \"input_tokens\": 11, \"output_tokens\": 19, \"available_quotas\": {}, \"tool_calls\": [], \"tool_results\": []}}\n\n" } }Also applies to: 1517-1525
7074-7107: ToolResultSummary.content needs an explicit JSON type union.content is required but typeless; make it “any JSON value” for validators/codegen.
"ToolResultSummary": { "properties": { @@ - "content": { - "title": "Content", - "description": "Content/result returned from the tool" - }, + "content": { + "anyOf": [ + { "type": "string" }, + { "type": "number" }, + { "type": "boolean" }, + { "type": "object", "additionalProperties": true }, + { "type": "array", "items": {} }, + { "type": "null" } + ], + "title": "Content", + "description": "Content/result returned from the tool" + },
🧹 Nitpick comments (20)
tests/e2e/features/environment.py (1)
123-123: Reasonable timeout increase for container stabilization.The increased wait time from 5s to 20s before health checks is a pragmatic adjustment for llama-stack 0.3.0. However, consider making this configurable via environment variable for flexibility across different CI environments:
+ # Allow environment override for slower CI environments + initial_wait = int(os.getenv("E2E_LLAMA_STACK_RESTORE_WAIT", "20")) - time.sleep(20) + time.sleep(initial_wait)src/utils/llama_stack_version.py (1)
61-80: Good version extraction with regex; minor logging refinement suggested.The regex-based version extraction properly handles versions like "0.3.0-dev" by extracting the semver core. The exception raising on failure (addressing the previous review) is appropriate.
One minor note: logging
warningbefore raising an exception is slightly redundant since the exception will propagate the same message. Consider usinglogger.debugfor the pre-exception log or removing it:if not match: - logger.warning( + logger.debug( "Failed to extract version pattern from '%s'. Skipping version check.", version_info, )src/app/endpoints/query_v2.py (4)
320-324: Updateretrieve_responsedocstring return type to match the actual 4‑tupleThe implementation returns
(TurnSummary, str, list[ReferencedDocument], TokenCounter), but the docstring still documentstuple[TurnSummary, str], which is misleading.Consider tightening the docstring:
- Returns: - tuple[TurnSummary, str]: A tuple containing a summary of the LLM or agent's response content - and the conversation ID, the list of parsed referenced documents, - and token usage information. + Returns: + tuple[TurnSummary, str, list[ReferencedDocument], TokenCounter]: A tuple + containing a summary of the LLM or agent's response content, the + normalized conversation ID, the list of parsed referenced documents, + and token usage information.
388-390: Consider downgrading full response logging to debug level
logger.info("Response: %s", response)will emit the entire Responses API payload, which can be large and may leak sensitive content into production logs.Consider logging only IDs/metadata at
infolevel and reserving full payload logging fordebug:- logger.info("Response: %s", response) - logger.debug( - "Received response with ID: %s, conversation ID: %s, output items: %d", - response.id, - conversation_id, - len(response.output), - ) + logger.debug("Full Responses API payload: %s", response) + logger.info( + "Received Responses API response id=%s, conversation_id=%s, output_items=%d", + response.id, + conversation_id, + len(response.output), + )
452-549: Defensive handling ofattributesin file search resultsThe de‑duplication logic and dual handling of dict vs object representations are good. One fragile bit:
- In the
file_search_callbranch you treatattributesas a mapping and call.get(...)unconditionally. Ifresult.attributesis a non‑mapping object, this will raiseAttributeError.You can safely support both dict and object cases:
- if isinstance(result, dict): - filename = result.get("filename") - attributes = result.get("attributes", {}) - else: - filename = getattr(result, "filename", None) - attributes = getattr(result, "attributes", {}) - - # Try to get URL from attributes - # Look for common URL fields in attributes - doc_url = ( - attributes.get("link") - or attributes.get("url") - or attributes.get("doc_url") - ) + if isinstance(result, dict): + filename = result.get("filename") + attributes = result.get("attributes", {}) + else: + filename = getattr(result, "filename", None) + attributes = getattr(result, "attributes", {}) + + # Try to get URL from attributes (handle both dict and object) + doc_url = None + if isinstance(attributes, dict): + doc_url = ( + attributes.get("link") + or attributes.get("url") + or attributes.get("doc_url") + ) + else: + doc_url = ( + getattr(attributes, "link", None) + or getattr(attributes, "url", None) + or getattr(attributes, "doc_url", None) + )This keeps behavior the same for dicts while avoiding surprises if llama‑stack starts returning a typed object for
attributes.
50-51: Consider using__name__for the module loggerCoding guidelines recommend the
logger = logging.getLogger(__name__)pattern. This module currently uses a hard‑coded name:logger = logging.getLogger("app.endpoints.handlers")Unless you specifically want to aggregate several modules under a single logger, consider:
-logger = logging.getLogger("app.endpoints.handlers") +logger = logging.getLogger(__name__)to keep logger naming consistent across modules.
tests/unit/app/test_routers.py (1)
62-87: Router expectations correctly updated; consider relaxing the hard-coded countThe tests correctly:
- Assert
query_v2,streaming_query_v2,conversations_v2, andconversations_v3routers are included.- Expect
len(app.routers) == 16, reflecting the new router set.This keeps router wiring tightly validated, but the strict
len(app.routers) == 16assertion in both tests is somewhat brittle and will require updates whenever a new router is added or removed. If you want these tests to be less noisy on unrelated router changes, you could:
- Drop the length assertion, or
- Assert
>=and then explicitly check presence/prefixes of the routers you care about.Not strictly necessary, but worth considering.
docs/conversations_api.md (2)
196-221: Cross-check example response schemas against the current Query/Streaming modelsThe non‑streaming and streaming examples still document fields like
rag_chunksand a minimalendevent payload. Given this PR’s broader changes (ToolCallSummary/ToolResultSummary, updated QueryResponse, enriched streaming end events, etc.), there’s a good chance these examples will drift from the actual Pydantic models and/openapi.json.I recommend regenerating or at least validating these JSON examples directly against:
QueryResponse/StreamingQueryResponse(or their current names), and- The OpenAPI schema for
/v2/queryand/v2/streaming_query.That way, fields such as
tool_calls, anytool_results,referenced_documents, token/quota fields, and their shapes stay in lockstep with the real API.Also applies to: 321-333, 234-244
14-27: Tidy Markdown list indentation and code-fence languages for linting and readabilitymarkdownlint is flagging a couple of purely-formatting issues here:
- Nested ToC bullets (lines under “Conversation ID Formats”, “How Conversations Work”, “API Endpoints”) are indented more than the usual 2 spaces for subitems.
- Several fenced blocks (ID format example, normalized ID, SSE stream samples, and the error snippet) don’t declare a language, so they trigger MD040 and miss out on syntax highlighting.
Consider:
- Normalizing nested list indentation to two spaces under their parents.
- Adding languages to fences, e.g.
```textfor ID format snippets,```jsonfor JSON,```bashfor curl, and```sqlfor DDL.As per static analysis hints, this will satisfy markdownlint and make the guide easier to read.
Also applies to: 58-66, 77-79, 234-244, 369-379, 469-471
tests/unit/app/endpoints/test_conversations_v2.py (1)
563-582: Revisitsuccessflag semantics for failed deletionsBoth
test_conversation_not_foundandtest_unsuccessful_deletionnow assert:
response.success is Trueresponse.response == "Conversation cannot be deleted"even when
conversation_cache.delete()returnsFalse(no delete performed). While this matches the new implementation, the combination of a failure-style message withsuccess=Trueis likely to be confusing for clients that treat the flag as the primary indicator of outcome.Consider either:
- Setting
successtoFalsewhen the conversation cannot be deleted, or- Renaming/reframing the field if it is meant to signal “the request was handled successfully” rather than “the resource was deleted.”
If the current contract is intentional and already relied upon, it would help to document this behavior clearly in the API docs.
Also applies to: 608-629
tests/integration/endpoints/test_query_v2_integration.py (1)
1199-1227: Nice avoidance of filesystem side effects; consider asserting transcript calls by flagPatching
app.endpoints.query.store_transcripthere is a good way to keep the integration test from touching the filesystem while still exercising the transcript flow. Since you already toggletranscripts_enabledbetweenTrueandFalse, you might slightly strengthen this test by asserting the patched function’s call count, e.g.:- mocker.patch("app.endpoints.query.store_transcript") + store_transcript_mock = mocker.patch("app.endpoints.query.store_transcript") ... - test_config.user_data_collection_configuration.transcripts_enabled = False + # With transcripts enabled, store_transcript should be called at least once + assert store_transcript_mock.call_count >= 1 + + test_config.user_data_collection_configuration.transcripts_enabled = False ... response_disabled = await query_endpoint_handler_v2(...) ... + # With transcripts disabled, no additional transcript writes should be triggered + final_call_count = store_transcript_mock.call_count + assert store_transcript_mock.call_count == final_call_countNot mandatory, but it would make the transcript gating behavior explicit.
tests/unit/app/endpoints/test_query_v2.py (1)
795-795: Consider usingmocker.Mock()instead ofmocker.AsyncMock()for consistency.The
mock_clientis created withmocker.AsyncMock()here, but in other tests in this file it's created withmocker.Mock(). While both work (since individual methods are mocked asAsyncMock), usingmocker.Mock()consistently would improve readability.- mock_client = mocker.AsyncMock() + mock_client = mocker.Mock()tests/unit/app/endpoints/test_streaming_query.py (3)
47-144: Local Agent API mock types are reasonable but could be centralizedDefining
TextDelta,ToolCallDelta,TurnResponseEvent, theAgentTurnResponse*payloads,ToolExecutionStep, andToolResponseas lightweight attribute bags is a pragmatic way to keep Agent‑API tests working against llama‑stack‑client 0.3.x. If more tests need these, consider moving them into a shared test helper/module to avoid duplication and keep future API updates localized.
276-288: Connection‑error streaming test could assert streamed error payloadThe
_raise_connection_errorside effect andAPIConnectionErrorpatching correctly exercise the 503 path and ensure aStreamingResponseis returned, but the test currently only checks type/media type. To tighten coverage, you could also read the SSE body (similar to the quota test below) and assert the presence of the standardized “Unable to connect to Llama Stack” error message.
476-585: Marking legacy Agent API streaming tests as skipped is a good transitional stepUsing
@pytest.mark.skip(reason="Deprecated API test")on the legacy Agent‑API streaming andretrieve_responsetests keeps the historical behavior documented without breaking against llama‑stack‑client 0.3.x. Once the old APIs are fully removed from production, consider deleting these tests (and related mocks) to reduce maintenance overhead.src/app/endpoints/conversations_v3.py (3)
47-48: Uselogging.getLogger(__name__)for module loggerCurrent logger initialization uses a hard‑coded name
"app.endpoints.handlers". Per the project guidelines, preferlogging.getLogger(__name__)so the logger name tracks the actual module path and stays correct if the file moves.
106-165: Tighten typing and edge‑case handling insimplify_conversation_items
- The parameter is typed as
list[dict]; for clarity and static checking it should belist[dict[str, Any]].- The pairing logic assumes items arrive as alternating user/assistant messages; if the backend ever returns an odd number of message items or interleaves other message types, you’ll emit turns with empty assistant text or mispaired messages.
If you expect more diverse item streams, consider:
- Deriving roles from the underlying item structure (e.g., role field) instead of positional pairing.
- Guarding against a trailing unmatched user item (e.g., only append a turn when an assistant message exists).
380-489: Delete handler treats allAPIStatusErrorcases as “already deleted”In
delete_conversation_endpoint_handler, anyAPIStatusErrorfromclient.conversations.deleteis treated as “conversation not found in LlamaStack” and silently ignored. That makes sense for true 404s (idempotent delete), but it also swallows 401/403/5xx from the backend.If you want better observability, consider checking the status code and:
- Treat 404 as “already deleted” (current behavior).
- For other codes, log as an error and surface a 5xx
InternalServerErrorResponse.generic()or a more specific mapping.src/models/responses.py (1)
28-41: Consider removing commented‑out ToolCall/ToolResult modelsThe old
ToolCallandToolResultclasses are fully commented out. If they are no longer used anywhere, it would be cleaner to delete them entirely (or move them into a dedicated legacy/back‑compat module) rather than keeping commented code in a core models file.src/utils/types.py (1)
63-75: GraniteToolParser docstring still mentions Optional while signature does not
get_tool_callsis typed asself, output_message: AgentCompletionMessagebut the docstring parameter description saysAgentCompletionMessage | None. Since the implementation handles falsyoutput_message, either:
- Update the signature to
output_message: AgentCompletionMessage | None, or- Adjust the docstring to drop
| Noneand state that callers should not passNone.Keeping these in sync will help static analysis and readers.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
uv.lockis excluded by!**/*.lock
📒 Files selected for processing (52)
.github/workflows/e2e_tests.yaml(1 hunks)docker-compose.yaml(0 hunks)docs/conversations_api.md(1 hunks)docs/openapi.json(11 hunks)pyproject.toml(2 hunks)run.yaml(1 hunks)run_library.yaml(1 hunks)src/app/endpoints/conversations_v2.py(0 hunks)src/app/endpoints/conversations_v3.py(1 hunks)src/app/endpoints/query.py(9 hunks)src/app/endpoints/query_v2.py(16 hunks)src/app/endpoints/streaming_query.py(19 hunks)src/app/endpoints/streaming_query_v2.py(14 hunks)src/app/routers.py(2 hunks)src/constants.py(1 hunks)src/metrics/utils.py(1 hunks)src/models/requests.py(1 hunks)src/models/responses.py(7 hunks)src/utils/endpoints.py(5 hunks)src/utils/llama_stack_version.py(2 hunks)src/utils/suid.py(1 hunks)src/utils/token_counter.py(1 hunks)src/utils/transcripts.py(1 hunks)src/utils/types.py(3 hunks)test.containerfile(1 hunks)tests/configuration/minimal-stack.yaml(1 hunks)tests/e2e/configs/run-azure.yaml(1 hunks)tests/e2e/configs/run-ci.yaml(2 hunks)tests/e2e/configs/run-library.yaml(1 hunks)tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml(1 hunks)tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml(1 hunks)tests/e2e/features/conversation_cache_v2.feature(2 hunks)tests/e2e/features/conversations.feature(1 hunks)tests/e2e/features/environment.py(1 hunks)tests/e2e/features/info.feature(1 hunks)tests/e2e/features/steps/conversation.py(2 hunks)tests/e2e/features/steps/info.py(2 hunks)tests/e2e/test_list.txt(1 hunks)tests/integration/endpoints/test_query_v2_integration.py(8 hunks)tests/integration/test_openapi_json.py(4 hunks)tests/unit/app/endpoints/test_conversations_v2.py(1 hunks)tests/unit/app/endpoints/test_query.py(30 hunks)tests/unit/app/endpoints/test_query_v2.py(27 hunks)tests/unit/app/endpoints/test_streaming_query.py(28 hunks)tests/unit/app/endpoints/test_streaming_query_v2.py(13 hunks)tests/unit/app/test_routers.py(3 hunks)tests/unit/models/responses/test_error_responses.py(3 hunks)tests/unit/models/responses/test_query_response.py(3 hunks)tests/unit/models/responses/test_rag_chunk.py(1 hunks)tests/unit/models/responses/test_successful_responses.py(6 hunks)tests/unit/utils/test_endpoints.py(11 hunks)tests/unit/utils/test_transcripts.py(3 hunks)
💤 Files with no reviewable changes (2)
- docker-compose.yaml
- src/app/endpoints/conversations_v2.py
✅ Files skipped from review due to trivial changes (1)
- tests/e2e/test_list.txt
🚧 Files skipped from review as they are similar to previous changes (10)
- tests/configuration/minimal-stack.yaml
- src/constants.py
- src/models/requests.py
- src/utils/token_counter.py
- tests/unit/utils/test_transcripts.py
- tests/e2e/features/info.feature
- src/utils/transcripts.py
- test.containerfile
- src/metrics/utils.py
- tests/unit/models/responses/test_rag_chunk.py
🧰 Additional context used
📓 Path-based instructions (8)
tests/e2e/features/**/*.feature
📄 CodeRabbit inference engine (CLAUDE.md)
Use behave (BDD) framework with Gherkin feature files for end-to-end tests
Files:
tests/e2e/features/conversation_cache_v2.featuretests/e2e/features/conversations.feature
tests/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use
pytest-mockwith AsyncMock objects for mocking in tests
Files:
tests/e2e/features/environment.pytests/integration/test_openapi_json.pytests/unit/app/endpoints/test_conversations_v2.pytests/unit/app/endpoints/test_query_v2.pytests/unit/models/responses/test_error_responses.pytests/unit/utils/test_endpoints.pytests/unit/app/endpoints/test_query.pytests/unit/models/responses/test_query_response.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/e2e/features/steps/info.pytests/unit/models/responses/test_successful_responses.pytests/e2e/features/steps/conversation.pytests/integration/endpoints/test_query_v2_integration.pytests/unit/app/test_routers.pytests/unit/app/endpoints/test_streaming_query.py
pyproject.toml
📄 CodeRabbit inference engine (CLAUDE.md)
pyproject.toml: Configure pylint withsource-roots = "src"
Excludesrc/auth/k8s.pyfrom pyright type checking
Files:
pyproject.toml
tests/{unit,integration}/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
tests/{unit,integration}/**/*.py: Use pytest for all unit and integration tests; do not use unittest framework
Unit tests must achieve 60% code coverage; integration tests must achieve 10% coverage
Files:
tests/integration/test_openapi_json.pytests/unit/app/endpoints/test_conversations_v2.pytests/unit/app/endpoints/test_query_v2.pytests/unit/models/responses/test_error_responses.pytests/unit/utils/test_endpoints.pytests/unit/app/endpoints/test_query.pytests/unit/models/responses/test_query_response.pytests/unit/app/endpoints/test_streaming_query_v2.pytests/unit/models/responses/test_successful_responses.pytests/integration/endpoints/test_query_v2_integration.pytests/unit/app/test_routers.pytests/unit/app/endpoints/test_streaming_query.py
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/conversations_v3.pysrc/app/endpoints/streaming_query.pysrc/utils/endpoints.pysrc/app/endpoints/query.pysrc/utils/suid.pysrc/utils/types.pysrc/app/routers.pysrc/app/endpoints/query_v2.pysrc/utils/llama_stack_version.pysrc/models/responses.py
src/app/endpoints/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
Use FastAPI
HTTPExceptionwith appropriate status codes for API endpoint error handling
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/conversations_v3.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/query.pysrc/app/endpoints/query_v2.py
src/**/{client,app/endpoints/**}.py
📄 CodeRabbit inference engine (CLAUDE.md)
Handle
APIConnectionErrorfrom Llama Stack in integration code
Files:
src/app/endpoints/streaming_query_v2.pysrc/app/endpoints/conversations_v3.pysrc/app/endpoints/streaming_query.pysrc/app/endpoints/query.pysrc/app/endpoints/query_v2.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/responses.py
🧠 Learnings (11)
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.
Applied to files:
tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yamltests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
📚 Learning: 2025-08-25T09:11:38.701Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 445
File: tests/e2e/features/environment.py:0-0
Timestamp: 2025-08-25T09:11:38.701Z
Learning: In the Behave Python testing framework, the Context object is created once for the entire test run and reused across all features and scenarios. Custom attributes added to the context persist until explicitly cleared, so per-scenario state should be reset in hooks to maintain test isolation.
Applied to files:
tests/e2e/features/environment.pytests/e2e/features/steps/info.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code
Applied to files:
tests/e2e/features/environment.pytests/unit/app/endpoints/test_query.pysrc/app/endpoints/conversations_v3.pysrc/app/endpoints/query.py
📚 Learning: 2025-08-18T10:58:14.951Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:47-47
Timestamp: 2025-08-18T10:58:14.951Z
Learning: psycopg2-binary is required by some llama-stack providers in the lightspeed-stack project, so it cannot be replaced with psycopg v3 or moved to optional dependencies without breaking llama-stack functionality.
Applied to files:
pyproject.toml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to pyproject.toml : Exclude `src/auth/k8s.py` from pyright type checking
Applied to files:
pyproject.toml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.
Applied to files:
tests/e2e/configuration/library-mode/lightspeed-stack-no-cache.yaml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/**/*.py : Use `pytest-mock` with AsyncMock objects for mocking in tests
Applied to files:
tests/unit/utils/test_endpoints.pytests/unit/app/endpoints/test_query.pytests/integration/endpoints/test_query_v2_integration.pytests/unit/app/endpoints/test_streaming_query.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/{unit,integration}/**/*.py : Use pytest for all unit and integration tests; do not use unittest framework
Applied to files:
tests/unit/utils/test_endpoints.py
📚 Learning: 2025-08-06T06:02:21.060Z
Learnt from: eranco74
Repo: lightspeed-core/lightspeed-stack PR: 348
File: src/utils/endpoints.py:91-94
Timestamp: 2025-08-06T06:02:21.060Z
Learning: The direct assignment to `agent._agent_id` in `src/utils/endpoints.py` is a necessary workaround for the missing agent rehydration feature in the LLS client SDK. This allows preserving conversation IDs when handling existing agents.
Applied to files:
tests/unit/utils/test_endpoints.pysrc/utils/endpoints.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/app/endpoints/**/*.py : Use FastAPI `HTTPException` with appropriate status codes for API endpoint error handling
Applied to files:
tests/unit/app/endpoints/test_query.pysrc/app/endpoints/conversations_v3.py
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/test_list.txt : Maintain test list in `tests/e2e/test_list.txt` for end-to-end tests
Applied to files:
.github/workflows/e2e_tests.yaml
🧬 Code graph analysis (14)
tests/unit/app/endpoints/test_query_v2.py (1)
src/app/endpoints/streaming_query_v2.py (1)
retrieve_response(364-461)
tests/unit/models/responses/test_error_responses.py (1)
src/models/responses.py (7)
PromptTooLongResponse(1590-1619)AbstractErrorResponse(1176-1258)DetailModel(1169-1173)openapi_response(47-60)openapi_response(458-480)openapi_response(862-901)openapi_response(1213-1258)
tests/unit/app/endpoints/test_query.py (2)
tests/unit/app/endpoints/test_streaming_query.py (2)
ToolExecutionStep(128-133)ToolResponse(136-141)src/utils/types.py (1)
TurnSummary(135-220)
tests/unit/models/responses/test_query_response.py (2)
src/models/responses.py (1)
ReferencedDocument(328-338)src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
tests/unit/app/endpoints/test_streaming_query_v2.py (3)
src/app/endpoints/streaming_query_v2.py (1)
streaming_query_endpoint_handler_v2(327-361)src/models/requests.py (1)
QueryRequest(73-267)src/models/responses.py (1)
model(1731-1744)
src/app/endpoints/streaming_query_v2.py (5)
src/app/endpoints/query_v2.py (2)
parse_referenced_documents_from_responses_api(452-549)retrieve_response(289-449)src/utils/quota.py (1)
get_available_quotas(69-89)src/utils/suid.py (2)
normalize_conversation_id(101-122)to_llama_stack_conversation_id(125-145)src/utils/token_counter.py (1)
TokenCounter(18-41)src/app/endpoints/streaming_query.py (3)
stream_start_event(124-145)stream_end_event(148-197)retrieve_response(998-1119)
src/app/endpoints/streaming_query.py (4)
src/app/endpoints/query.py (1)
parse_referenced_documents(605-632)src/models/responses.py (16)
AbstractErrorResponse(1176-1258)ForbiddenResponse(1381-1528)InternalServerErrorResponse(1782-1940)NotFoundResponse(1531-1587)QuotaExceededResponse(1670-1779)ServiceUnavailableResponse(1943-1974)StreamingQueryResponse(454-519)UnauthorizedResponse(1297-1378)ReferencedDocument(328-338)openapi_response(47-60)openapi_response(458-480)openapi_response(862-901)openapi_response(1213-1258)conversation(1441-1462)model(1731-1744)generic(1837-1848)src/utils/quota.py (1)
get_available_quotas(69-89)src/utils/types.py (1)
content_to_str(20-39)
tests/unit/models/responses/test_successful_responses.py (2)
src/models/responses.py (4)
openapi_response(47-60)openapi_response(458-480)openapi_response(862-901)openapi_response(1213-1258)src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
src/utils/endpoints.py (3)
src/models/responses.py (2)
model(1731-1744)NotFoundResponse(1531-1587)src/utils/types.py (2)
GraniteToolParser(60-99)get_parser(82-99)src/utils/suid.py (1)
get_suid(6-16)
src/app/endpoints/query.py (2)
src/models/responses.py (5)
PromptTooLongResponse(1590-1619)InternalServerErrorResponse(1782-1940)QuotaExceededResponse(1670-1779)model(1731-1744)generic(1837-1848)src/utils/types.py (2)
TurnSummary(135-220)content_to_str(20-39)
tests/e2e/features/steps/conversation.py (1)
tests/e2e/utils/utils.py (3)
replace_placeholders(147-160)restart_container(130-144)switch_config(98-106)
src/app/routers.py (2)
tests/unit/app/test_routers.py (1)
include_router(36-51)src/models/config.py (1)
config(324-341)
tests/unit/app/endpoints/test_streaming_query.py (3)
src/models/responses.py (2)
ReferencedDocument(328-338)model(1731-1744)src/models/requests.py (1)
QueryRequest(73-267)src/app/endpoints/streaming_query.py (2)
streaming_query_endpoint_handler(961-995)stream_end_event(148-197)
src/models/responses.py (1)
src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
🪛 markdownlint-cli2 (0.18.1)
docs/conversations_api.md
16-16: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
17-17: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
18-18: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
20-20: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
21-21: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
22-22: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
24-24: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
25-25: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
26-26: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
27-27: Unordered list indentation
Expected: 2; Actual: 3
(MD007, ul-indent)
58-58: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
63-63: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
77-77: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
234-234: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
369-369: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
469-469: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: server mode / ci
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
docs/openapi.json (3)
2596-2611: Delete conversation 200 example: “not found” should not be success=trueSet success to false and align message (“cannot”).
"not found": { "value": { "conversation_id": "123e4567-e89b-12d3-a456-426614174000", - "response": "Conversation can not be deleted", - "success": true + "response": "Conversation cannot be deleted", + "success": false } }
4568-4574: ConversationDeleteResponse examples: fix “not found” success flagMirror the fix in schema examples.
- "value": { - "conversation_id": "123e4567-e89b-12d3-a456-426614174000", - "response": "Conversation can not be deleted", - "success": true - } + "value": { + "conversation_id": "123e4567-e89b-12d3-a456-426614174000", + "response": "Conversation cannot be deleted", + "success": false + }
1373-1381: Typo in 422 example: “attatchment” → “attachment”Minor docs polish.
- "cause": "Invalid attatchment type: must be one of ['text/plain', 'application/json', 'application/yaml', 'application/xml']", + "cause": "Invalid attachment type: must be one of ['text/plain', 'application/json', 'application/yaml', 'application/xml']",
♻️ Duplicate comments (5)
src/models/responses.py (1)
513-516: Verify the SSE end-event structure matches runtime behavior.The past review comment indicated that the SSE example should match the actual runtime behavior. While
rag_chunkshas been removed, please confirm that this end-event structure accurately reflects whatstream_end_eventactually emits:
available_quotasat the top level (alongsideeventanddata)referenced_documents,truncated,input_tokens,output_tokensinsidedatadocs/openapi.json (4)
6363-6406: QueryResponse doc drift: remove rag_chunks; keep tool_calls/tool_resultsDescription still references rag_chunks; update to match schema and examples.
- "description": "Model representing LLM response to a query.\n\nAttributes:\n conversation_id: The optional conversation ID (UUID).\n response: The response.\n rag_chunks: List of RAG chunks used to generate the response.\n referenced_documents: The URLs and titles for the documents used to generate the response.\n tool_calls: List of tool calls made during response generation.\n truncated: Whether conversation history was truncated.\n input_tokens: Number of tokens sent to LLM.\n output_tokens: Number of tokens received from LLM.\n available_quotas: Quota available as measured by all configured quota limiters.", + "description": "Model representing LLM response to a query.\n\nAttributes:\n conversation_id: The optional conversation ID (UUID).\n response: The response.\n referenced_documents: Documents referenced in generating the response.\n tool_calls: List of tool calls made during response generation.\n tool_results: List of tool call results.\n truncated: Whether conversation history was truncated.\n input_tokens: Number of tokens sent to LLM.\n output_tokens: Number of tokens received from LLM.\n available_quotas: Quota available as measured by all configured quota limiters.",Also applies to: 6332-6362
1189-1194: operationId: remove leaked impl version ("v2")Keep public v1 operationId stable for SDKs.
- "operationId": "query_endpoint_handler_v2_v1_query_post", + "operationId": "query_endpoint_handler_v1_query_post",
1516-1524: Streaming opId + SSE example/headers: fix drift and correctness
- Remove "v2" from v1 operationId.
- Example: use boolean for truncated; keep available_quotas inside data; include empty tool_calls/tool_results; simplify token stream.
- Declare standard SSE headers to guide clients.
- "operationId": "streaming_query_endpoint_handler_v2_v1_streaming_query_post", + "operationId": "streaming_query_endpoint_handler_v1_streaming_query_post", @@ - "text/event-stream": { - "schema": { - "type": "string", - "format": "text/event-stream" - }, - "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"No Violation\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 1, \"token\": \"\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 2, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 3, \"token\": \"!\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 4, \"token\": \" How\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 5, \"token\": \" can\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 6, \"token\": \" I\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 7, \"token\": \" assist\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 8, \"token\": \" you\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 9, \"token\": \" today\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 10, \"token\": \"?\"}}\n\ndata: {\"event\": \"turn_complete\", \"data\": {\"token\": \"Hello! How can I assist you today?\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 11, \"output_tokens\": 19}, \"available_quotas\": {}}\n\n" - } + "text/event-stream": { + "schema": { "type": "string", "format": "text/event-stream" }, + "headers": { + "Cache-Control": { "schema": { "type": "string" }, "description": "no-cache" }, + "Connection": { "schema": { "type": "string" }, "description": "keep-alive" } + }, + "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"turn_complete\", \"data\": {\"token\": \"Hello! How can I assist you today?\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": false, \"input_tokens\": 11, \"output_tokens\": 19, \"available_quotas\": {}, \"tool_calls\": [], \"tool_results\": []}}\n\n" + }Also applies to: 1499-1504
7080-7116: ToolResultSummary.content: add explicit “any JSON” typecontent is required but typeless; generators/validators will choke. Define a proper union.
- "content": { - "title": "Content", - "description": "Content/result returned from the tool" - }, + "content": { + "anyOf": [ + { "type": "string" }, + { "type": "number" }, + { "type": "boolean" }, + { "type": "object", "additionalProperties": true }, + { "type": "array", "items": {} }, + { "type": "null" } + ], + "title": "Content", + "description": "Content/result returned from the tool" + },
🧹 Nitpick comments (2)
src/models/responses.py (2)
28-41: Remove commented-out code.The commented
ToolCallandToolResultmodels should be deleted rather than left as comments. If needed for reference, they're preserved in git history.Apply this diff to remove the commented code:
-# class ToolCall(BaseModel): -# """Model representing a tool call made during response generation.""" - -# tool_name: str = Field(description="Name of the tool called") -# arguments: dict[str, Any] = Field(description="Arguments passed to the tool") -# result: dict[str, Any] | None = Field(None, description="Result from the tool") - - -# class ToolResult(BaseModel): -# """Model representing a tool result.""" - -# tool_name: str = Field(description="Name of the tool") -# result: dict[str, Any] = Field(description="Result from the tool") -
1610-1615: Consider using consistent docstring format.The docstring uses
Args:while other methods in this file useParameters:(e.g., lines 1193-1196, 1223-1226). For consistency, consider usingParameters:throughout.def __init__(self, *, response: str = "Prompt is too long", cause: str): - """Initialize a PromptTooLongResponse. + """ + Initialize a PromptTooLongResponse with error details. - Args: + Parameters: response: Short summary of the error. Defaults to "Prompt is too long". cause: Detailed explanation of what caused the error. """
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/openapi.json(17 hunks)src/models/responses.py(6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/models/responses.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/responses.py
🧬 Code graph analysis (1)
src/models/responses.py (1)
src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
🪛 GitHub Actions: Python linter
src/models/responses.py
[warning] 425-425: Pylint: Line too long (102/100) (line-too-long).
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: server mode / azure
- GitHub Check: E2E: server mode / ci
🔇 Additional comments (6)
src/models/responses.py (4)
13-13: LGTM!The import follows the absolute import pattern required by coding guidelines and correctly brings in the new summary-based tool call/result types.
15-15: LGTM!Centralizing description constants improves maintainability and ensures consistency across OpenAPI responses.
Also applies to: 24-24
407-415: LGTM!The migration to
ToolCallSummaryandToolResultSummaryaligns with the PR objectives and provides a cleaner API surface. The optional typing is appropriate for these fields.
1591-1621: LGTM!The
PromptTooLongResponseclass is well-structured and follows the established pattern for error responses. The HTTP 413 status code is appropriate for payload-too-large errors, and the implementation includes proper examples and type annotations.docs/openapi.json (2)
3922-3936: APIKeyTokenConfiguration: nice improvementswriteOnly, format=password, examples, and required look good.
5709-5710: Docs enrichment: 👍 helpful background links/descriptionsGood additions; no issues.
Also applies to: 5872-5873, 6900-6901
69c53b9 to
ff31c6a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
docs/openapi.json (1)
6364-6370: QueryResponse description drift (mentions rag_chunks; omits tool_results)Public surface removed rag_chunks and added tool_results. Update description for accuracy.
- "description": "Model representing LLM response to a query.\n\nAttributes:\n conversation_id: The optional conversation ID (UUID).\n response: The response.\n rag_chunks: List of RAG chunks used to generate the response.\n referenced_documents: The URLs and titles for the documents used to generate the response.\n tool_calls: List of tool calls made during response generation.\n truncated: Whether conversation history was truncated.\n input_tokens: Number of tokens sent to LLM.\n output_tokens: Number of tokens received from LLM.\n available_quotas: Quota available as measured by all configured quota limiters.", + "description": "Model representing LLM response to a query.\n\nAttributes:\n conversation_id: The optional conversation ID (UUID).\n response: The response.\n referenced_documents: Documents referenced in generating the response.\n tool_calls: List of tool calls made during response generation.\n tool_results: List of tool call results.\n truncated: Whether conversation history was truncated.\n input_tokens: Number of tokens sent to LLM.\n output_tokens: Number of tokens received from LLM.\n available_quotas: Quota available as measured by all configured quota limiters.",
♻️ Duplicate comments (4)
src/models/responses.py (1)
425-426: Fix line length violation.The URL string exceeds the 100-character line limit (102 characters). This was flagged in a previous review but remains unresolved.
Apply this diff:
{ - "doc_url": "https://docs.openshift.com/container-platform/4.15/" - "operators/understanding/olm/olm-understanding-olm.html", + "doc_url": ( + "https://docs.openshift.com/container-platform/4.15/" + "operators/understanding/olm/olm-understanding-olm.html" + ), "doc_title": "Operator Lifecycle Manager concepts and resources", },docs/openapi.json (3)
1499-1524: Streaming: fix operationId and SSE example (null boolean, field placement)
- Keep v1-only operationId.
- SSE example: truncated must be boolean; available_quotas should be inside data; add empty tool arrays for parity.
- "operationId": "streaming_query_endpoint_handler_v2_v1_streaming_query_post", + "operationId": "streaming_query_endpoint_handler_v1_streaming_query_post", @@ - "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"No Violation\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 1, \"token\": \"\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 2, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 3, \"token\": \"!\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 4, \"token\": \" How\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 5, \"token\": \" can\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 6, \"token\": \" I\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 7, \"token\": \" assist\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 8, \"token\": \" you\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 9, \"token\": \" today\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 10, \"token\": \"?\"}}\n\ndata: {\"event\": \"turn_complete\", \"data\": {\"token\": \"Hello! How can I assist you today?\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": null, \"input_tokens\": 11, \"output_tokens\": 19}, \"available_quotas\": {}}\n\n" + "example": "data: {\"event\": \"start\", \"data\": {\"conversation_id\": \"123e4567-e89b-12d3-a456-426614174000\"}}\n\ndata: {\"event\": \"token\", \"data\": {\"id\": 0, \"token\": \"Hello\"}}\n\ndata: {\"event\": \"turn_complete\", \"data\": {\"token\": \"Hello! How can I assist you today?\"}}\n\ndata: {\"event\": \"end\", \"data\": {\"referenced_documents\": [], \"truncated\": false, \"input_tokens\": 11, \"output_tokens\": 19, \"available_quotas\": {}, \"tool_calls\": [], \"tool_results\": []}}\n\n"Optional: declare standard SSE headers for client guidance.
"text/event-stream": { "schema": { "type": "string", "format": "text/event-stream" }, + "headers": { + "Cache-Control": { "schema": { "type": "string" }, "description": "no-cache" }, + "Connection": { "schema": { "type": "string" }, "description": "keep-alive" } + },
1189-1194: operationId leaks impl version; keep v1-only nameUse a stable v1 identifier to avoid SDK/codegen churn.
- "operationId": "query_endpoint_handler_v2_v1_query_post", + "operationId": "query_endpoint_handler_v1_query_post",
7091-7115: ToolResultSummary.content lacks type; make it an explicit any‑JSON unionWithout a type, validation/codegen is ambiguous. Define a permissive union.
- "content": { - "title": "Content", - "description": "Content/result returned from the tool" - }, + "content": { + "anyOf": [ + { "type": "string" }, + { "type": "number" }, + { "type": "boolean" }, + { "type": "object", "additionalProperties": true }, + { "type": "array", "items": {} }, + { "type": "null" } + ], + "title": "Content", + "description": "Content/result returned from the tool" + },
🧹 Nitpick comments (1)
docs/openapi.json (1)
1516-1524: SSE content-type OK; add headers for correctness and interopAdvertise standard SSE headers to prevent intermediate caching/connection close surprises.
"text/event-stream": { "schema": { "type": "string", "format": "text/event-stream" }, + "headers": { + "Cache-Control": { "schema": { "type": "string" }, "description": "no-cache" }, + "Connection": { "schema": { "type": "string" }, "description": "keep-alive" } + },
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/openapi.json(17 hunks)src/models/responses.py(6 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
src/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/**/*.py: Use absolute imports for internal modules in LCS project (e.g.,from auth import get_auth_dependency)
All modules must start with descriptive docstrings explaining their purpose
Uselogger = logging.getLogger(__name__)pattern for module logging
All functions must include complete type annotations for parameters and return types, using modern syntax (str | int) andOptional[Type]orType | None
All functions must have docstrings with brief descriptions following Google Python docstring conventions
Function names must use snake_case with descriptive, action-oriented names (get_, validate_, check_)
Avoid in-place parameter modification anti-patterns; return new data structures instead of modifying input parameters
Useasync deffor I/O operations and external API calls
All classes must include descriptive docstrings explaining their purpose following Google Python docstring conventions
Class names must use PascalCase with descriptive names and standard suffixes:Configurationfor config classes,Error/Exceptionfor exceptions,Resolverfor strategy patterns,Interfacefor abstract base classes
Abstract classes must use ABC with@abstractmethoddecorators
Include complete type annotations for all class attributes in Python classes
Useimport loggingand module logger pattern with standard log levels: debug, info, warning, error
Files:
src/models/responses.py
src/models/**/*.py
📄 CodeRabbit inference engine (CLAUDE.md)
src/models/**/*.py: Use@field_validatorand@model_validatorfor custom validation in Pydantic models
Pydantic configuration classes must extendConfigurationBase; data models must extendBaseModel
Files:
src/models/responses.py
🧬 Code graph analysis (1)
src/models/responses.py (1)
src/utils/types.py (2)
ToolCallSummary(102-110)ToolResultSummary(113-124)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: build-pr
- GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
- GitHub Check: E2E: server mode / azure
- GitHub Check: E2E: server mode / ci
tisnik
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, a lot of work done there
| from llama_stack.models.llama.llama3.tokenizer import Tokenizer | ||
| from llama_stack_client import APIConnectionError, APIStatusError | ||
| from llama_stack_client.types.agents.turn import Turn | ||
| from llama_stack_client.types.alpha.agents.turn import Turn |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"nice" package name, sounds like it is super stable ;)
are-ces
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Description
Bump to llama stack version 0.3.0. Adds query and conversation endpoints using responses API that are compatible with OLS endpoints.
Type of change
Tools used to create PR
Identify any AI code assistants used in this PR (for transparency and review context)
Related Tickets & Documents
Checklist before requesting a review
Testing
Summary by CodeRabbit
New Features
Bug Fixes
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.