Skip to content

Conversation

@radofuchs
Copy link
Contributor

@radofuchs radofuchs commented Dec 3, 2025

Description

Type of change

  • Refactor
  • New feature
  • Bug fix
  • CVE fix
  • Optimization
  • Documentation Update
  • Configuration Update
  • Bump-up service version
  • Bump-up dependent library
  • Bump-up library or tool used for development (does not change the final image)
  • CI configuration change
  • Konflux configuration change
  • Unit tests improvement
  • Integration tests improvement
  • End to end tests improvement

Tools used to create PR

Identify any AI code assistants used in this PR (for transparency and review context)

  • Assisted-by: Cursor

Related Tickets & Documents

  • Related Issue #LCORE-1016
  • Closes #LCORE-1016

Checklist before requesting a review

  • I have performed a self-review of my code.
  • PR has passed all pre-merge test jobs.
  • If it is a core feature, I have added thorough tests.

Testing

  • Please provide detailed steps to perform tests related to this code change.
  • How were the fix/results from this change verified? Please provide relevant screenshots or results.

Summary by CodeRabbit

  • New Features

    • Lightspeed Stack now supports a library-mode deployment alongside server mode.
  • Tests

    • Expanded end-to-end matrix to cover server/library modes and CI/Azure environments.
    • Added mode-specific test configurations and conditional scenario skipping for library mode.
  • Chores

    • Updated container and service startup and configuration handling for multi-mode runs.
    • Added a local end-to-end test target and improved run-time logging and error messages.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 3, 2025

Walkthrough

Adds library-mode support to E2E CI and tests: introduces a mode×environment matrix, mode-aware config selection and startup paths (server vs library), new library-mode configs, mode-aware test hooks and tags, and related container/compose and run config updates.

Changes

Cohort / File(s) Summary
E2E Workflow Orchestration
​.github/workflows/e2e_tests.yaml
Expanded CI matrix to mode × environment, added E2E_DEPLOYMENT_MODE propagation, mode-specific lightspeed-stack loader, dynamic run.yaml selection, mode-aware startup (server vs library), improved error/logging and mode-aware failure log collection.
Container Images
Containerfile, test.containerfile
Create /opt/app-root/src/.llama/distributions/ollama and /opt/app-root/src/.llama/providers.d and set ownership (1001:1001 or 1001:0 as adjusted) during image build.
Docker Compose (Library)
docker-compose-library.yaml
Compose file for library mode: single lightspeed-stack container, mounts configs, exposes 8080, healthcheck and env wiring. (No functional change observed in this PR content.)
Run Configs (env-specific)
tests/e2e/configs/run-ci.yaml, tests/e2e/configs/run-azure.yaml, tests/e2e/configs/run-rhelai.yaml
Set external_providers_dir from null to /opt/app-root/src/.llama/providers.d.
Base Run Config
run.yaml
Set external_providers_dir to /opt/app-root/src/.llama/providers.d.
Lightspeed Stack Configs (server/library)
tests/e2e/configuration/lightspeed-stack-server-mode.yaml, tests/e2e/configuration/lightspeed-stack-library-mode.yaml
New mode-specific lightspeed-stack configs: server-mode points at external llama-stack; library-mode configures embedded/library-client usage with library_client_config_path: run.yaml.
Library-mode Test Configs
tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml, tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
New test-config variants for library-mode: noop-with-token auth; invalid feedback storage path for negative testing.
E2E Test Features (tags)
tests/e2e/features/conversations.feature, tests/e2e/features/health.feature, tests/e2e/features/info.feature, tests/e2e/features/query.feature, tests/e2e/features/conversation_cache_v2.feature
Added @skip-in-library-mode to scenarios that require external llama-stack availability so they are skipped in library-mode runs.
E2E Test Environment Logic
tests/e2e/features/environment.py
Added E2E_DEPLOYMENT_MODE handling, computed mode_dir, mode-aware scenario config selection and Authorized tag handling, library-mode scenario skipping, and limited llama-stack restoration to server mode.
Makefile
Makefile
Added test-e2e-local target to run E2E tests locally via direct uv run invocation.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant CI as CI Workflow (.github)
    participant FS as Repo (configs)
    participant Runner as Job Runner
    participant Orchestrator as startup script
    participant Compose as docker-compose / docker-compose-library
    participant Service as lightspeed-stack container(s)
    participant Tests as E2E test runner

    CI->>Runner: start matrix job (mode, environment)
    Runner->>Orchestrator: set E2E_DEPLOYMENT_MODE (mode) and ENV
    Orchestrator->>FS: load lightspeed-stack-<mode>-mode.yaml and run-<env>.yaml
    Orchestrator->>Compose: choose compose file based on mode
    Compose->>Service: start containers (server: multiple, library: single)
    Service-->>Orchestrator: healthcheck / logs
    Orchestrator->>Tests: run e2e suite with mode-aware hooks (skip tags, configs)
    Tests-->>Service: interact (HTTP, library client)
    alt failure
        Orchestrator->>Runner: collect mode-specific logs
    end
    Tests->>Runner: report results
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

  • Areas needing close attention:
    • .github/workflows/e2e_tests.yaml: matrix propagation, env wiring, file-existence checks and run.yaml selection logic.
    • tests/e2e/features/environment.py: correct mode branching, skip-rule implementation, and restoration logic limited to server mode.
    • Containerfile / test.containerfile: directory creation and ownership consistency for /opt/app-root/src/.llama/....
    • New mode-specific config files and docker-compose-library.yaml: validate mounts, env vars, and healthcheck behavior.

Possibly related PRs

Suggested reviewers

  • tisnik

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The PR title accurately reflects the main change: adding files and infrastructure required to support library mode end-to-end tests, including new config files, workflow updates, and test annotations.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (10)
tests/e2e/configs/run-rhelai.yaml (1)

19-19: external_providers_dir now hard‑codes container path

Setting external_providers_dir to /opt/app-root/src/.llama/providers.d matches the new container layout and should unblock RHEL AI runs. Please confirm this config is only used in that containerized context; if anyone reuses run-rhelai.yaml outside the image, this absolute path may not exist and could cause provider loading issues.

run.yaml (1)

19-19: Double‑check impact of absolute providers path on local runs

For containerized runs (including library‑mode in the new test container), /opt/app-root/src/.llama/providers.d is correct and consistent. If run.yaml is ever used directly on a developer machine (e.g., local llama‑stack runs outside Docker), that absolute path likely won’t exist; depending on llama‑stack behavior this could be a no‑op or an error. Worth confirming local usage expectations and, if needed, documenting that run.yaml is container‑only or making the path configurable.

Makefile (1)

27-30: Good local e2e shortcut; clarify usage and keep flags in sync

test-e2e-local mirrors test-e2e without the script wrapper, which is nice for local debugging. Two small suggestions:

  • Consider clarifying in the description that test-e2e is the CI/recorded variant while test-e2e-local is for local interactive runs.
  • Whenever you change behave flags or tests/e2e/test_list.txt, keep both targets in sync so local and CI behave runs exercise the same scenarios. Based on learnings, tests/e2e/test_list.txt is the canonical e2e test manifest.
test.containerfile (1)

4-11: Container layout matches configs; consider minor hardening

The added mkdir/chown steps correctly provision:

  • /opt/app-root/src/.llama/distributions/ollama
  • /opt/app-root/src/.llama/providers.d

which matches the updated run*.yaml configurations. A couple of optional tweaks:

  • Use python -m pip install faiss-cpu==1.11.0 instead of bare pip to avoid surprises if pip points at a different interpreter.
  • If you expect group‑writable access under .llama, you might mirror /app-root and add an explicit chmod -R 775 /opt/app-root/src/.llama; currently 755 + correct ownership is probably fine for UID 1001, though.

Nothing blocking here.

tests/e2e/configuration/lightspeed-stack-server-mode.yaml (1)

1-21: Server‑mode config looks coherent; verify data paths and wiring

This server‑mode LCS config is internally consistent:

  • use_as_library_client: false and url: http://llama-stack:8321 match the “external llama‑stack service” comment.
  • noop auth and test‑only api_key are appropriate for e2e.

Two things to verify:

  • That the docker‑compose/test harness for server‑mode mounts or creates /tmp/data/feedback and /tmp/data/transcripts with suitable permissions, so tests don’t fail on I/O.
  • That environment/tag wiring (e.g., in tests/e2e/features/environment.py) actually routes the relevant server‑mode scenarios to this config, similar to how noop vs noop‑with‑token configs are swapped. Based on learnings, config selection is tag‑driven.
tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml (1)

1-20: Library‑mode invalid feedback storage config matches intent

This config correctly sets up a library‑mode LCS using run.yaml while deliberately pointing feedback_storage at an invalid path, which should drive negative test coverage for feedback persistence. A couple of follow‑ups:

  • Confirm that the new library‑mode tests (or tags) actually select this config via the existing environment/tag‑driven mechanism (similar to the noop-with-token flow mentioned in prior learnings).
  • Ensure the application surfaces a clear, testable failure (HTTP status / error message) when /invalid is used, so the e2e scenario can assert on behavior rather than just a generic crash.

Based on learnings, config swapping is tag‑based, so making sure this file is part of that mapping is key.

tests/e2e/configuration/lightspeed-stack-library-mode.yaml (1)

1-20: Base library-mode configuration looks consistent with intended usage

Using use_as_library_client: true with library_client_config_path: run.yaml and a noop auth module (with auth_enabled: false) is appropriate for the default, non-Authorized library-mode runs. The structure also mirrors the auth-noop-token variant, which keeps configs predictable.

tests/e2e/features/environment.py (1)

54-59: Deployment mode detection is straightforward and robust enough

Using E2E_DEPLOYMENT_MODE with a lower-cased value and defaulting to "server" cleanly exposes context.is_library_mode for downstream hooks and logging, without impacting existing behavior when the env var is unset.

.github/workflows/e2e_tests.yaml (2)

128-129: Path transformation via sed is fragile; recommend using a dedicated config update function or YAML parser.

The sed commands on lines 128-129 assume exact string matches and may fail silently if the run.yaml structure differs (e.g., different indentation, comments, or key ordering):

sed -i 's|db_path: \.llama/distributions|db_path: /app-root/.llama/distributions|g' run.yaml
sed -i 's|db_path: tmp/|db_path: /app-root/.llama/distributions/|g' run.yaml

If these patterns don't match the actual content, the paths remain unchanged and tests may fail with cryptic container path errors.

Consider one of the following alternatives:

  1. Pre-generate config files with correct paths for container environments instead of transforming them at runtime.
  2. Use a YAML-aware tool (e.g., yq) to update paths safely:
    yq eval '.db_path = "/app-root/.llama/distributions"' -i run.yaml
  3. Add validation to confirm the sed replacements succeeded:
    if grep -q 'db_path: \./\|db_path: tmp/' run.yaml; then
      echo "Error: Path transformation failed"
      exit 1
    fi

191-204: Hardcoded 20-second wait may be insufficient or excessive; consider using polling with a timeout.

Line 191 uses a fixed sleep 20s before connectivity tests. This is:

  • Too short if library initialization or external API calls are slow (especially on CI infrastructure).
  • Too long if services start quickly, delaying feedback on failures.

Replace the fixed sleep with a polling loop that checks service health and fails fast on errors:

- name: Wait for services
  run: |
    echo "Waiting for services to be ready..."
    max_attempts=30
    attempt=0
    while [ $attempt -lt $max_attempts ]; do
      if curl -sf http://localhost:8080/liveness > /dev/null 2>&1; then
        echo "✓ Service is ready"
        exit 0
      fi
      attempt=$((attempt + 1))
      echo "Attempt $attempt/$max_attempts - service not ready yet, retrying in 2s..."
      sleep 2
    done
    echo "✗ Service failed to become ready after ${max_attempts} attempts ($(($max_attempts * 2))s)"
    exit 1

This provides faster feedback on success and clearer diagnostics on timeout.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between a29b069 and 0b175b7.

📒 Files selected for processing (18)
  • .github/workflows/e2e_tests.yaml (7 hunks)
  • Containerfile (1 hunks)
  • Makefile (1 hunks)
  • docker-compose-library.yaml (1 hunks)
  • run.yaml (1 hunks)
  • test.containerfile (1 hunks)
  • tests/e2e/configs/run-azure.yaml (1 hunks)
  • tests/e2e/configs/run-ci.yaml (1 hunks)
  • tests/e2e/configs/run-rhelai.yaml (1 hunks)
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1 hunks)
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml (1 hunks)
  • tests/e2e/configuration/lightspeed-stack-library-mode.yaml (1 hunks)
  • tests/e2e/configuration/lightspeed-stack-server-mode.yaml (1 hunks)
  • tests/e2e/features/conversations.feature (2 hunks)
  • tests/e2e/features/environment.py (4 hunks)
  • tests/e2e/features/health.feature (2 hunks)
  • tests/e2e/features/info.feature (4 hunks)
  • tests/e2e/features/query.feature (1 hunks)
🧰 Additional context used
📓 Path-based instructions (2)
tests/e2e/features/**/*.feature

📄 CodeRabbit inference engine (CLAUDE.md)

Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Files:

  • tests/e2e/features/query.feature
  • tests/e2e/features/health.feature
  • tests/e2e/features/info.feature
  • tests/e2e/features/conversations.feature
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/e2e/features/environment.py
🧠 Learnings (7)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/test_list.txt : Maintain test list in `tests/e2e/test_list.txt` for end-to-end tests

Applied to files:

  • tests/e2e/features/health.feature
  • Makefile
  • .github/workflows/e2e_tests.yaml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Run `uv run make verify` to run all linters (black, pylint, pyright, ruff, docstyle, check-types) before completion

Applied to files:

  • Makefile
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/features/**/*.feature : Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Applied to files:

  • Makefile
  • tests/e2e/features/environment.py
  • .github/workflows/e2e_tests.yaml
📚 Learning: 2025-09-02T11:09:40.404Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/features/environment.py:87-95
Timestamp: 2025-09-02T11:09:40.404Z
Learning: In the lightspeed-stack e2e tests, noop authentication tests use the default lightspeed-stack.yaml configuration, while noop-with-token tests use the Authorized tag to trigger a config swap to the specialized noop-with-token configuration file.

Applied to files:

  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configuration/lightspeed-stack-library-mode.yaml
  • tests/e2e/configuration/lightspeed-stack-server-mode.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
📚 Learning: 2025-09-02T11:15:02.411Z
Learnt from: radofuchs
Repo: lightspeed-core/lightspeed-stack PR: 485
File: tests/e2e/test_list.txt:2-3
Timestamp: 2025-09-02T11:15:02.411Z
Learning: In the lightspeed-stack e2e tests, the Authorized tag is intentionally omitted from noop authentication tests because they are designed to test against the default lightspeed-stack.yaml configuration rather than the specialized noop-with-token configuration.

Applied to files:

  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to src/**/{client,app/endpoints/**}.py : Handle `APIConnectionError` from Llama Stack in integration code

Applied to files:

  • tests/e2e/features/info.feature
  • tests/e2e/features/conversations.feature
📚 Learning: 2025-08-18T10:56:55.349Z
Learnt from: matysek
Repo: lightspeed-core/lightspeed-stack PR: 292
File: pyproject.toml:0-0
Timestamp: 2025-08-18T10:56:55.349Z
Learning: The lightspeed-stack project intentionally uses a "generic image" approach, bundling many dependencies directly in the base runtime image to work for everyone, rather than using lean base images with optional dependency groups.

Applied to files:

  • docker-compose-library.yaml
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: e2e_tests (azure)
  • GitHub Check: e2e_tests (ci)
🔇 Additional comments (17)
tests/e2e/configs/run-azure.yaml (1)

19-19: Consistent provider directory for Azure config

Using /opt/app-root/src/.llama/providers.d keeps this Azure run config aligned with the container filesystem. Just make sure this file is not used for any non‑container Azure/dev flow where that path wouldn’t exist, otherwise provider discovery could fail silently.

tests/e2e/configs/run-ci.yaml (1)

19-19: CI config correctly aligned with container layout

Pointing external_providers_dir at /opt/app-root/src/.llama/providers.d matches the directories created in the test container and keeps CI config in sync with Azure/RHEL AI configs. Just ensure any future container layout changes are reflected in this path as well.

tests/e2e/features/conversations.feature (2)

132-146: Library-mode skip tag for llama-stack GET error scenario looks correct

Tagging this llama-stack-unavailable GET scenario with @skip-in-library-mode aligns with the new before_scenario gating and avoids invalid behavior when no separate llama-stack container exists.


181-194: Consistent skip behavior for DELETE + llama-stack-unavailable scenario

Same here: marking the DELETE-on-unavailable-llama-stack scenario as @skip-in-library-mode is consistent with the library/server mode split and prevents misuse of the external container in library mode.

Containerfile (1)

76-79: Directory setup for llama-stack library mode is consistent

Provisioning /opt/app-root/src/.llama/distributions/ollama and /opt/app-root/src/.llama/providers.d with ownership 1001:1001 matches the non-root runtime user and the external_providers_dir used in the e2e configs.

tests/e2e/features/health.feature (1)

43-52: Health endpoint llama-stack-down scenarios correctly skipped in library mode

Adding @skip-in-library-mode to the readiness and liveness scenarios that disrupt llama-stack is aligned with the new library/server mode split and avoids invalid container operations in library mode.

Also applies to: 55-63

tests/e2e/features/info.feature (1)

21-30: Mode-aware skipping for llama-stack failure scenarios is appropriate

All added @skip-in-library-mode tags on the info/models/shields/tools “llama-stack unreachable” scenarios correctly restrict these to server mode and align with the new before_scenario skip rule.

Also applies to: 39-48, 57-66, 114-123

tests/e2e/features/query.feature (1)

110-120: Query 503-on-llama-stack-down scenario correctly gated to server mode

Adding @skip-in-library-mode here is consistent with the environment hook and other features: scenarios that disrupt llama-stack now only run when there is a separate llama-stack container (server mode).

tests/e2e/features/environment.py (3)

85-88: Library-mode skipping and mode-specific invalid-feedback config paths make sense

  • The skip-in-library-mode tag gate in before_scenario correctly skips scenarios that rely on a separate llama-stack container when context.is_library_mode is true.
  • The InvalidFeedbackStorageConfig scenario configuration path being derived from library-mode vs server-mode ensures each mode can use its own invalid-storage config without overloading a single file.

It’d be worth double-checking that:

  • Every scenario that disrupts the llama-stack container is tagged with @skip-in-library-mode.
  • Both tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml and tests/e2e/configuration/server-mode/lightspeed-stack-invalid-feedback-storage.yaml exist and mirror the previous single-file behavior.

Also applies to: 91-93


101-106: Conditioning llama-stack restoration on non-library mode is appropriate

Guarding the llama-stack restart logic with not context.is_library_mode avoids attempting to manage a container that doesn’t exist in library mode, while preserving the previous restoration behavior for server mode.


156-159: Mode-specific auth-noop-token config wiring is good; confirm server/library configs and auth flags

Using mode_dir = "library-mode" if context.is_library_mode else "server-mode" for the @Authorized feature config keeps server and library modes clearly separated and lines up with the new config files.

Given this, please verify that:

  • tests/e2e/configuration/server-mode/lightspeed-stack-auth-noop-token.yaml exists and retains the previous noop-with-token semantics for server mode.
  • The new library-mode variant (tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml) has service.auth_enabled set as intended for @Authorized tests (likely true so that noop-with-token actually enforces token presence), to avoid divergence between server and library mode auth behavior.

This will ensure Authorized e2e coverage behaves consistently across both deployment modes.

tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1)

1-20: Review comment cannot be fully verified without test execution results

The assumption that auth_enabled: false breaks @Authorized semantics appears questionable: web documentation examples show noop-with-token intentionally paired with auth_enabled: false. The learnings indicate deliberate config separation between noop (default) and noop-with-token (@Authorized) tests. Without confirming whether @Authorized tests currently pass or fail with the library-mode config, and without visibility into test-specific override logic in environment.py, the recommendation to change auth_enabled: true cannot be substantiated.

docker-compose-library.yaml (1)

1-37: Verify Containerfile exists and mount paths align with workflow expectations.

The docker-compose-library.yaml configuration looks structurally sound, but a few details need verification:

  1. Mount contexts: Lines 13-14 mount ./lightspeed-stack.yaml and ./run.yaml from the workflow working directory (should be generated by lines 65 and 119 of the workflow). Confirm these paths are predictable when the workflow runs in the repository root.

  2. SELinux mount option (:Z): Line 13-14 use :Z for relabeling. This is SELinux-specific and idempotent across containers but may not apply to all systems. If cross-platform compatibility is important, consider documenting this requirement or using conditional logic.

  3. Containerfile build context: Line 5-6 build from context . using Containerfile. Verify the Containerfile exists at the repository root and builds correctly in the CI environment.

  4. Healthcheck timing: The start_period: 15s on line 36 is appropriate for library initialization overhead, but if initialization can take longer (especially under load), this may need adjustment.

Can you confirm:

  • Containerfile exists and builds successfully?
  • Are the mount paths (./lightspeed-stack.yaml, ./run.yaml) correctly staged by the workflow before docker-compose is invoked?
  • Is the :Z mount option intentional for SELinux, or should it be :z for broader compatibility?
.github/workflows/e2e_tests.yaml (4)

171-186: Verify docker-compose-library.yaml exists in the repository root.

The workflow references docker-compose-library.yaml in the library mode startup conditional. Ensure this file is present in the repository root; otherwise, the workflow will fail when attempting to reference it.


12-15: Verify all 4 matrix job combinations have required config files.

The 2D matrix (mode × environment) creates 4 job combinations: server/ci, server/azure, library/ci, library/azure. Before merging, ensure these configuration files exist:

  • tests/e2e/configuration/lightspeed-stack-server-mode.yaml
  • tests/e2e/configuration/lightspeed-stack-library-mode.yaml
  • tests/e2e/configs/run-ci.yaml
  • tests/e2e/configs/run-azure.yaml

Also verify Containerfile is present and the make test-e2e target respects the E2E_DEPLOYMENT_MODE environment variable.


52-66: Mode-specific config loading includes defensive checks; verify that referenced config files exist and have the expected YAML structure.

The config loading logic includes good error handling and file existence validation. However, manual verification is needed to confirm that the config files (lightspeed-stack-server-mode.yaml and lightspeed-stack-library-mode.yaml) contain the expected YAML structure and keys expected by subsequent processing steps.


206-217: Verify that make test-e2e target respects the E2E_DEPLOYMENT_MODE environment variable.

The workflow propagates E2E_DEPLOYMENT_MODE to the test step (line 210), but you should confirm the make test-e2e target (or its underlying test runner) actually reads and uses this variable for conditional test logic. Check the Makefile for the test-e2e target definition and verify the test infrastructure (environment.py, conftest.py, or similar) reads E2E_DEPLOYMENT_MODE at runtime to configure test behavior based on deployment mode.

@radofuchs radofuchs force-pushed the E2E_tests_library_mode branch from 0b175b7 to af4b4e5 Compare December 3, 2025 09:19
@radofuchs radofuchs force-pushed the E2E_tests_library_mode branch from 2133c5d to 2a4bdf4 Compare December 3, 2025 09:22
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
Containerfile (1)

76-78: Library-mode directories and ownership look correct; consider centralizing the base path

Pre-creating /opt/app-root/src/.llama/... and chowning to 1001:1001 before switching to USER 1001 is the right pattern for allowing the non-root user to write provider/distribution data. This should work well with the library-mode tests and configs that reference this path.

As a minor maintainability improvement, you could centralize the .llama base path in an env var so it’s not hard-coded in multiple places and is easier to change later. For example:

-# Create llama-stack directories for library mode
-RUN mkdir -p /opt/app-root/src/.llama/distributions/ollama /opt/app-root/src/.llama/providers.d && \
-    chown -R 1001:1001 /opt/app-root/src/.llama
+# Create llama-stack directories for library mode
+ENV LLAMA_HOME=/opt/app-root/src/.llama
+RUN mkdir -p "$LLAMA_HOME/distributions/ollama" "$LLAMA_HOME/providers.d" && \
+    chown -R 1001:1001 "$LLAMA_HOME"

This is optional, the current behavior is functionally fine.

tests/e2e/configs/run-ci.yaml (1)

19-19: Explicit external_providers_dir path looks reasonable; please verify container path

Setting external_providers_dir to /opt/app-root/src/.llama/providers.d avoids relying on defaults and matches the new provider-discovery model, but other paths (DBs) are rewritten to /app-root/.llama/distributions in the workflow. Please double‑check that this directory actually exists in the CI image and that the /opt/app-root/src vs /app-root distinction is intentional; if so, consider a brief comment or doc note so future changes don’t “normalize” it accidentally.

tests/e2e/features/environment.py (1)

54-59: Mode detection + mode-specific config selection look consistent; consider centralizing mode_dir

Reading E2E_DEPLOYMENT_MODE into context.deployment_mode/context.is_library_mode and then using mode_dir = "library-mode" if context.is_library_mode else "server-mode" for both scenario and feature configs ties cleanly into the GitHub Actions matrix and the new directory layout. The @skip-in-library-mode gating is also scoped correctly to only skip scenarios that depend on a separate llama-stack container.

Two small follow‑ups:

  • It may be worth storing mode_dir once on context in before_all (e.g., context.mode_dir) and reusing it in before_scenario/before_feature to avoid recomputing and to keep paths consistent if the logic ever changes.
  • Please verify that all of the referenced mode-specific files actually exist:
    • tests/e2e/configuration/server-mode/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
    • tests/e2e/configuration/server-mode/library-mode/lightspeed-stack-no-cache.yaml
    • tests/e2e/configuration/server-mode/library-mode/lightspeed-stack-auth-noop-token.yaml
      so CI doesn’t fail at runtime due to a missing config.

Also applies to: 85-91, 92-97, 166-172

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0b175b7 and 2a4bdf4.

📒 Files selected for processing (19)
  • .github/workflows/e2e_tests.yaml (7 hunks)
  • Containerfile (1 hunks)
  • Makefile (1 hunks)
  • docker-compose-library.yaml (1 hunks)
  • run.yaml (1 hunks)
  • test.containerfile (1 hunks)
  • tests/e2e/configs/run-azure.yaml (1 hunks)
  • tests/e2e/configs/run-ci.yaml (1 hunks)
  • tests/e2e/configs/run-rhelai.yaml (1 hunks)
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml (1 hunks)
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml (1 hunks)
  • tests/e2e/configuration/lightspeed-stack-library-mode.yaml (1 hunks)
  • tests/e2e/configuration/lightspeed-stack-server-mode.yaml (1 hunks)
  • tests/e2e/features/conversation_cache_v2.feature (2 hunks)
  • tests/e2e/features/conversations.feature (2 hunks)
  • tests/e2e/features/environment.py (4 hunks)
  • tests/e2e/features/health.feature (2 hunks)
  • tests/e2e/features/info.feature (4 hunks)
  • tests/e2e/features/query.feature (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (14)
  • tests/e2e/configs/run-rhelai.yaml
  • tests/e2e/features/health.feature
  • tests/e2e/features/conversations.feature
  • tests/e2e/configuration/library-mode/lightspeed-stack-invalid-feedback-storage.yaml
  • run.yaml
  • Makefile
  • tests/e2e/features/info.feature
  • test.containerfile
  • tests/e2e/configuration/lightspeed-stack-library-mode.yaml
  • docker-compose-library.yaml
  • tests/e2e/configuration/library-mode/lightspeed-stack-auth-noop-token.yaml
  • tests/e2e/configs/run-azure.yaml
  • tests/e2e/features/query.feature
  • tests/e2e/configuration/lightspeed-stack-server-mode.yaml
🧰 Additional context used
📓 Path-based instructions (2)
tests/e2e/features/**/*.feature

📄 CodeRabbit inference engine (CLAUDE.md)

Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Files:

  • tests/e2e/features/conversation_cache_v2.feature
tests/**/*.py

📄 CodeRabbit inference engine (CLAUDE.md)

Use pytest-mock with AsyncMock objects for mocking in tests

Files:

  • tests/e2e/features/environment.py
🧠 Learnings (2)
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/test_list.txt : Maintain test list in `tests/e2e/test_list.txt` for end-to-end tests

Applied to files:

  • .github/workflows/e2e_tests.yaml
📚 Learning: 2025-11-24T16:58:04.410Z
Learnt from: CR
Repo: lightspeed-core/lightspeed-stack PR: 0
File: CLAUDE.md:0-0
Timestamp: 2025-11-24T16:58:04.410Z
Learning: Applies to tests/e2e/features/**/*.feature : Use behave (BDD) framework with Gherkin feature files for end-to-end tests

Applied to files:

  • .github/workflows/e2e_tests.yaml
  • tests/e2e/features/environment.py
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: build-pr
  • GitHub Check: Konflux kflux-prd-rh02 / lightspeed-stack-on-pull-request
  • GitHub Check: e2e_tests (ci)
  • GitHub Check: e2e_tests (azure)
🔇 Additional comments (4)
tests/e2e/features/conversation_cache_v2.feature (1)

191-210: Library-mode skip tags correctly gate llama-stack–dependent scenarios

Using @skip-in-library-mode on the “llama-stack is down” GET/DELETE scenarios aligns with the new context.is_library_mode + skip-in-library-mode handling in environment.py, so these tests still run in server mode but are cleanly skipped when there’s no separate llama-stack container. Taging/indentation are valid for Behave/Gherkin.

Also applies to: 286-307

tests/e2e/features/environment.py (1)

114-120: Guarding llama-stack restoration by deployment mode avoids invalid docker operations

Adding not context.is_library_mode to the llama_stack_was_running check ensures the restore logic only runs when there is a separate llama-stack container, which is exactly what you want in server mode and avoids unnecessary/invalid docker start/exec calls under library mode. The error handling around CalledProcessError and the health-check loop remain intact.

.github/workflows/e2e_tests.yaml (2)

11-16: Matrix + deployment-mode wiring + config selection are coherent; verify required files are present

The mode/environment matrix, E2E_DEPLOYMENT_MODE propagation, and the new lightspeed-stack-${MODE}-mode.yaml loading step line up nicely with the new test hooks:

  • matrix.mode drives both the job name and E2E_DEPLOYMENT_MODE, which environment.py consumes via context.deployment_mode.
  • Mode-specific lightspeed-stack-${MODE}-mode.yaml is copied into lightspeed-stack.yaml with an explicit existence check and a clear failure message.
  • run.yaml remains environment-specific (run-ci.yaml/run-azure.yaml), and the workflow prints which config was used plus a small preview, which should make debugging much easier.

Please just verify that all of the following exist and are kept in sync with the test expectations:

  • tests/e2e/configuration/lightspeed-stack-server-mode.yaml
  • tests/e2e/configuration/lightspeed-stack-library-mode.yaml
  • tests/e2e/configs/run-ci.yaml, tests/e2e/configs/run-azure.yaml

so that new matrix combinations don’t fail purely due to missing config.

Also applies to: 23-23, 52-67, 94-97, 137-147


149-187: Server vs library docker-compose flows and failure logging look solid

Splitting startup into:

  • Server mode: docker compose up -d with basic health checks and full logs on failure.
  • Library mode: docker compose -f docker-compose-library.yaml up -d with mode-specific ps/logs handling,

and then using the same quick connectivity probe plus mode-aware log dumping on both initial connectivity failure and test failure is a clean way to keep both modes covered without duplicating too much logic. The env handling (OpenAI / Azure keys via GITHUB_ENV) is scoped to the job and you only print presence/length, not the secret values.

Main thing to keep an eye on is that docker-compose-library.yaml defines a lightspeed-stack service as expected; otherwise the library-mode log commands will fail. Once that’s confirmed, this looks ready to exercise both modes in CI.

Also applies to: 193-205, 219-233

@radofuchs radofuchs merged commit f76956f into lightspeed-core:main Dec 4, 2025
21 of 23 checks passed
@coderabbitai coderabbitai bot mentioned this pull request Dec 16, 2025
18 tasks
@coderabbitai coderabbitai bot mentioned this pull request Jan 8, 2026
15 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant