Skip to content

Conversation

@thestumonkey
Copy link
Contributor

@thestumonkey thestumonkey commented Jan 11, 2026

fixed segments not working and improved config so there are defaults that are activated by default

Summary by CodeRabbit

Release Notes

  • New Features

    • Configuration now supports environment variable interpolation and hierarchical defaults merging (user config overrides built-in defaults).
    • Automatic backups created when updating settings via API.
    • Configuration reload capability allows changes to take effect without server restart.
  • Bug Fixes

    • Improved error handling with sensible fallback values when configuration is unavailable.
    • Enhanced transcription data extraction robustness.
  • Tests

    • Added comprehensive validation for conversation transcript segments.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 11, 2026

📝 Walkthrough

Walkthrough

This pull request introduces a centralized hierarchical configuration system that merges defaults.yml, config.yml, and environment variables with caching and interpolation support. Docker Compose volumes are updated to mount the entire config directory. Multiple backend modules are refactored to use the centralized config instead of direct file access. Tests are enhanced to validate configuration override behavior and segment structure validation.

Changes

Cohort / File(s) Summary
Docker Configuration Updates
backends/advanced/docker-compose-test.yml, backends/advanced/docker-compose.yml
Updated volume mounts to expose entire config directory at /app/config instead of single config.yml file; added read-only mount for defaults.yml in workers service.
Centralized Config System
backends/advanced/src/advanced_omi_backend/config.py
Introduced hierarchical configuration loading with defaults merging, environment variable interpolation (_resolve_env, _deep_resolve_env), deep-merge capability (_deep_merge), cached config mechanism (_CONFIG_CACHE), and public API (get_config, reload_config, get_config_section, get_config_path). Added DEFAULT_SPEECH_DETECTION_SETTINGS, DEFAULT_CONVERSATION_STOP_SETTINGS, DEFAULT_AUDIO_STORAGE_SETTINGS constants.
Config File Defaults
config/defaults.yml
New file defining fallback configuration for model registry (OpenAI LLM, OpenAI Embeddings, Deepgram STT, Qdrant vector store), memory settings (Chronicle provider), and speaker recognition (disabled by default). Establishes priority: config.yml > environment variables > defaults.yml.
Config Template
config/config.yml.template
Replaced active model definitions with commented examples and placeholders; added inline guidance for memory providers and speaker recognition; disabled speaker recognition by default.
System Controller Refactoring
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
Refactored configuration access to use merged in-memory config instead of direct filesystem reads. Updated get_memory_config_raw, update_memory_config_raw, get_chat_config_yaml, and save_chat_config_yaml to leverage centralized config with backup creation and cache reloading. Added fallback paths and resilience improvements.
Model Registry Refactoring
backends/advanced/src/advanced_omi_backend/model_registry.py
Removed local environment resolution pipeline and replaced direct YAML parsing with centralized get_config(). Simplified load_models_config and get_models_registry to return non-Optional AppModels; added force_reload parameter. Removed file-system lookup logic.
Memory Config Refactoring
backends/advanced/src/advanced_omi_backend/services/memory/config.py
Replaced local YAML loading with centralized get_config() call. Updated load_config_yml to return merged configuration from central config system.
Transcription Enhancements
backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
Added safer data extraction with fallback for missing channels; introduced utterance validation and logging; added segment normalization to map "transcript" to "text" field for format consistency.
Import Cleanup
backends/advanced/src/advanced_omi_backend/llm_client.py
Removed unused imports: load_config_yml and resolve_value.
Test: Configuration Override
tests/endpoints/system_admin_tests.robot
Added Config Override Defaults Test to verify custom memory config (timeout_seconds: 999) persists and overrides defaults (1200). Updated Get Chat Configuration Test expected prompt text.
Test: Conversation Segments
tests/endpoints/conversation_tests.robot
Enhanced Get Conversation By ID Test with validation of conversation title, summary, detailed_summary, transcript presence; added segment extraction and per-segment validation (start, end, text, speaker keys; non-empty text; end > start).
Test: Integration Segments
tests/integration/integration_test.robot
Replaced heuristic segment counting (based on transcript type) with explicit segment structure validation. Added comprehensive per-segment checks including timing consistency (end > start) and mandatory fields validation.
Test: WebSocket Streaming
tests/integration/websocket_streaming_tests.robot
Commented out Verify Segments Match Expected Timestamps verification step in Segment Timestamps Match Cropped Audio test.

Sequence Diagram(s)

sequenceDiagram
    participant Service as Service/Module
    participant CentralConfig as config.get_config()
    participant Cache as _CONFIG_CACHE
    participant Defaults as defaults.yml
    participant UserConfig as config.yml
    participant EnvVars as Environment
    
    Service->>CentralConfig: get_config(force_reload=False)
    alt Cache exists and force_reload=False
        CentralConfig->>Cache: retrieve cached config
        Cache-->>CentralConfig: return cached dict
    else Cache miss or force_reload=True
        CentralConfig->>Defaults: load YAML
        Defaults-->>CentralConfig: base config dict
        CentralConfig->>UserConfig: load YAML
        UserConfig-->>CentralConfig: user overrides dict
        CentralConfig->>CentralConfig: _deep_merge(defaults, user_config)
        CentralConfig->>EnvVars: resolve env vars in merged config
        EnvVars-->>CentralConfig: interpolated values
        CentralConfig->>Cache: store merged config
    end
    CentralConfig-->>Service: return config dict
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1
❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'Fix segments' is too vague and generic. While segments are indeed addressed in the changeset, the title does not convey the full scope of changes, which includes a major configuration system refactor with defaults merging, environment variable resolution, and Docker compose updates alongside segment fixes. Consider a more descriptive title that captures both the segment fix and the configuration system improvements, such as 'Refactor config system with defaults merging and fix segment processing' or 'Implement centralized config with defaults and fix segment handling'.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 90.91% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing touches
  • 📝 Generate docstrings

Comment @coderabbitai help to get the list of available commands and usage tips.

@AnkushMalaker
Copy link
Collaborator

Do tests pass locally / did you try? @thestumonkey

@AnkushMalaker
Copy link
Collaborator

@coderabbitai review

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jan 11, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (7)
tests/integration/websocket_streaming_tests.robot (1)

210-216: Don’t silently disable segment timestamp verification—gate it or document why it’s flaky.

Commenting out the assertion reduces coverage and can let regressions slip (especially since this PR is “Fix segments”). Prefer a conditional check (e.g., only for a deterministic provider) or re-enable with a wider tolerance + an explicit TODO linking to the flakiness root cause.

backends/advanced/docker-compose.yml (1)

10-16: Align config layout across services (backend uses /app/config, workers uses /app/config.yml + /app/defaults.yml).

Given the PR’s centralized config system, I’d strongly prefer both services expose the same in-container structure (e.g., /app/config/config.yml, /app/config/defaults.yml, templates). Otherwise you risk “works in backend, breaks in workers” issues and confusing operator UX.

Proposed compose alignment (directory mount for workers too)
 services:
   chronicle-backend:
@@
     volumes:
@@
       - ./data:/app/data
       - ../../config:/app/config  # Mount entire config directory (contains config.yml, defaults.yml, templates)
@@
   workers:
@@
     volumes:
       - ./src:/app/src
       - ./start-workers.sh:/app/start-workers.sh
       - ./data/audio_chunks:/app/audio_chunks
       - ./data:/app/data
-      - ../../config/config.yml:/app/config.yml  # Removed :ro for consistency
-      - ../../config/defaults.yml:/app/defaults.yml:ro  # Built-in defaults
+      - ../../config:/app/config  # Match backend config layout

Also applies to: 58-65

tests/endpoints/conversation_tests.robot (1)

47-67: Bug: Should Not Be Empty is checking the whole conversation dict, not the title/summary/... fields.

These assertions currently won’t fail if title (etc.) is empty, because ${conversation} itself is non-empty.

Fix field-level assertions + make segments-key failure explicit
     # Verify conversation structure
     Dictionary Should Contain Key    ${conversation}    conversation_id
     Dictionary Should Contain Key    ${conversation}    audio_uuid
     Dictionary Should Contain Key    ${conversation}    created_at
-    Should Not Be Empty    ${conversation}    title
-    Should Not Be Empty    ${conversation}    summary
-    Should Not Be Empty    ${conversation}    detailed_summary
-    Should Not Be Empty    ${conversation}    transcript
+    Dictionary Should Contain Key    ${conversation}    title
+    Dictionary Should Contain Key    ${conversation}    summary
+    Dictionary Should Contain Key    ${conversation}    detailed_summary
+    Dictionary Should Contain Key    ${conversation}    transcript
+    Should Not Be Empty    ${conversation}[title]
+    Should Not Be Empty    ${conversation}[summary]
+    Should Not Be Empty    ${conversation}[detailed_summary]
+    Should Not Be Empty    ${conversation}[transcript]
     
-    ${segments}=    Set Variable    ${conversation}[segments]
+    Dictionary Should Contain Key    ${conversation}    segments
+    ${segments}=    Set Variable    ${conversation}[segments]
backends/advanced/docker-compose-test.yml (1)

153-164: Config mount paths diverge between backend-test and workers-test, causing inconsistent config loading behavior.

chronicle-backend-test mounts a single file at /app/config.yml, while workers-test mounts the entire config directory at /app/config. The config loader expects /app/config/config.yml and also loads defaults.yml from the same directory as the resolved config. This results in:

  • workers-test: correctly finds /app/config/config.yml and /app/config/defaults.yml
  • chronicle-backend-test: cannot find /app/config/config.yml (mounted as /app/config.yml), falls back to cwd, and cannot locate defaults.yml at the expected /app/config/defaults.yml

Align both services to mount the same config directory structure, or update the mounts to ensure both use consistent paths.

backends/advanced/src/advanced_omi_backend/model_registry.py (2)

179-233: Handle non-dict model entries; current code can crash on TypeError.

ModelDef(**m) will raise TypeError if m isn’t a mapping (e.g., a string/number from a malformed YAML list). You currently only catch ValidationError, so a single bad entry can still take down the whole registry load.

Proposed fix
     models: Dict[str, ModelDef] = {}
     for m in model_list:
         try:
             # Pydantic will handle validation automatically
-            model_def = ModelDef(**m)
+            if not isinstance(m, dict):
+                raise TypeError(f"Model entry must be a mapping, got {type(m).__name__}")
+            model_def = ModelDef(**m)
             models[model_def.name] = model_def
-        except ValidationError as e:
+        except (ValidationError, TypeError) as e:
             # Log but don't fail the entire registry load
             logging.warning(f"Failed to load model '{m.get('name', 'unknown')}': {e}")
             continue

179-196: Docstring no longer matches behavior (no YAML parsing here; no longer “raises ValidationError” in practice).

This function now delegates YAML loading/env expansion to get_config() and intentionally doesn’t fail the whole load on ValidationError (it logs and skips). Updating the docstring will prevent future callers/tests from relying on stale semantics.

backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (1)

287-334: Make config.yml writes atomic (prevent partial/corrupt YAML on crash).

Both backup + direct write can still leave a truncated/invalid config.yml if the process is killed mid-write. Use a temp file + os.replace() (atomic on POSIX) and consider Path APIs for consistency.

Proposed fix
-        # Write to config.yml
-        with open(cfg_path, 'w') as f:
-            yaml.safe_dump(data, f, sort_keys=False)
+        # Atomic write to config.yml
+        cfg_path = get_config_path()
+        tmp_path = str(cfg_path) + ".tmp"
+        with open(tmp_path, "w", encoding="utf-8") as f:
+            yaml.safe_dump(data, f, sort_keys=False)
+        os.replace(tmp_path, cfg_path)
🤖 Fix all issues with AI agents
In @backends/advanced/src/advanced_omi_backend/config.py:
- Around line 269-304: _find_config_path currently ignores the CONFIG_FILE env
var when the file doesn't yet exist, causing writes to fall back to
/app/config/config.yml; change the initial ENV override in _find_config_path to
return Path(cfg_env) whenever CONFIG_FILE is set (i.e., if cfg_env: return
Path(cfg_env)) so callers like get_config_path will honor a user-provided path
even on first-run, leaving the rest of the lookup logic unchanged.

In @config/defaults.yml:
- Line 51: The Authorization header in config/defaults.yml currently uses a
fallback empty value ("Authorization: Token ${DEEPGRAM_API_KEY:-}") which yields
a malformed header when DEEPGRAM_API_KEY is unset; update the config to require
the variable (e.g., "Authorization: Token ${DEEPGRAM_API_KEY}") and/or add
runtime validation where Deepgram is used (search for references to
DEEPGRAM_API_KEY and the Deepgram client initialization) to detect a missing key
and fail early with a clear error message before making API calls.
- Around line 87-89: The prompt value under the prompt key in defaults.yml is
written as a quoted multi-line string with the closing quote on its own line,
which introduces unintended trailing newlines; fix it by collapsing the prompt
into a single-line quoted string or convert it to a YAML block scalar (e.g., |
or >) so the closing delimiter is correct and no extra newlines or whitespace
are included—update the prompt entry so its content ends exactly where intended
without the stray closing-quote-only line.

In @tests/endpoints/system_admin_tests.robot:
- Line 194: Test assertion expects "specialized AI assistant" but the actual
default prompt text differs; update the Robot test to match the real default
prompt. Edit the assertion in tests/endpoints/system_admin_tests.robot (the
Should Contain check) to look for the actual default prompt beginning "You are a
helpful AI assistant with access to the user's personal memories and
conversation history." (matching the default prompt defined in SystemController
/ system_controller.py) so the test checks the real prompt text instead of
"specialized AI assistant".
🧹 Nitpick comments (6)
config/config.yml.template (1)

13-21: Consider adding inline comments to defaults section for consistency.

While the concrete default values work well for a template, adding inline comments showing that these reference models from defaults.yml helps users understand the configuration hierarchy at a glance. The current comments at lines 14-16 already do this well - consider extending the pattern to all entries.

💡 Optional enhancement
 defaults:
   llm: openai-llm              # LLM for memory extraction (from defaults.yml)
   embedding: openai-embed      # Embedding model (from defaults.yml)
   stt: stt-deepgram            # Speech-to-text service (from defaults.yml)
   # Transcription provider options:
   # - stt-deepgram: Cloud-based Deepgram (requires DEEPGRAM_API_KEY)
   # - stt-parakeet-batch: Local Parakeet ASR (requires Parakeet service running)
-  tts: tts-http                # Text-to-speech service
+  tts: tts-http                # Text-to-speech service (define custom in models section)
   vector_store: vs-qdrant      # Vector database (from defaults.yml)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (2)

254-285: Avoid broad except Exception here (or ensure fallback can’t throw).

If get_config() fails, the fallback path still calls get_config_path() and yaml.safe_dump(...). If either of those ever throws, you lose the intended “always return something” behavior. Consider narrowing exceptions and/or wrapping the fallback body too.


506-548: Use yaml.safe_dump and atomic write for chat config; align backup naming.

yaml.dump(...) can emit unsafe tags; safe_dump is the safer default. This path also directly writes to config.yml (same atomicity concern as memory). Also, backups use .backup here but .bak in memory—worth standardizing for ops/scripts.

backends/advanced/src/advanced_omi_backend/config.py (2)

246-267: Tighten _deep_merge typing; current ValueError handling won’t catch the common failure modes.

If override isn’t a dict, you’ll get AttributeError on override.items(), not ValueError. Consider explicit isinstance(..., dict) checks and fail fast with a clear message (or coerce to {} if you want permissive behavior).


306-370: Return a copy (or freeze) to prevent accidental mutation of the cached config.

get_config() returns the cached dict by reference. Any caller that mutates it will mutate the global config for everyone (and potentially across requests). If you want the cache to be authoritative, return a deep copy (or provide a read-only wrapper) and keep the cached object private.

backends/advanced/src/advanced_omi_backend/model_registry.py (1)

70-85: Guard embedding-dimension auto-defaulting by model provider to avoid silent dimension mismatches.

The validate_model validator applies hardcoded embedding dimensions based only on model_name (e.g., text-embedding-3-small → 1536, nomic-embed-text-v1.5 → 768) without checking self.model_provider or self.api_family. If a non-OpenAI provider or custom embedding server uses the same model name, this could silently set incorrect dimensions. Consider adding a provider check before applying defaults:

if self.model_name in defaults and self.model_provider in ('openai', 'unknown'):
    self.embedding_dimensions = defaults[self.model_name]

Or maintain a provider-specific defaults mapping instead. The current config template explicitly sets embedding_dimensions in all examples, but this validator serves as a safety net and should be provider-aware to prevent silent misconfigurations.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Knowledge base: Disabled due to Reviews -> Disable Knowledge Base setting

📥 Commits

Reviewing files that changed from the base of the PR and between 91dcd89 and 88c836e.

📒 Files selected for processing (14)
  • backends/advanced/docker-compose-test.yml
  • backends/advanced/docker-compose.yml
  • backends/advanced/src/advanced_omi_backend/config.py
  • backends/advanced/src/advanced_omi_backend/controllers/system_controller.py
  • backends/advanced/src/advanced_omi_backend/llm_client.py
  • backends/advanced/src/advanced_omi_backend/model_registry.py
  • backends/advanced/src/advanced_omi_backend/services/memory/config.py
  • backends/advanced/src/advanced_omi_backend/services/transcription/__init__.py
  • config/config.yml.template
  • config/defaults.yml
  • tests/endpoints/conversation_tests.robot
  • tests/endpoints/system_admin_tests.robot
  • tests/integration/integration_test.robot
  • tests/integration/websocket_streaming_tests.robot
💤 Files with no reviewable changes (1)
  • backends/advanced/src/advanced_omi_backend/llm_client.py
🧰 Additional context used
🧬 Code graph analysis (2)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)
backends/advanced/src/advanced_omi_backend/config.py (1)
  • get_config (306-370)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py (4)
backends/advanced/src/advanced_omi_backend/config.py (2)
  • get_config (306-370)
  • get_config_path (402-408)
backends/advanced/src/advanced_omi_backend/model_registry.py (1)
  • load_models_config (179-232)
backends/advanced/src/advanced_omi_backend/services/memory/service_factory.py (1)
  • get_memory_service (73-110)
backends/advanced/src/advanced_omi_backend/speaker_recognition_client.py (1)
  • SpeakerRecognitionClient (25-649)
🪛 Ruff (0.14.10)
backends/advanced/src/advanced_omi_backend/controllers/system_controller.py

275-275: Do not catch blind exception: Exception

(BLE001)


499-499: Consider moving this statement to an else block

(TRY300)


501-501: Do not catch blind exception: Exception

(BLE001)

backends/advanced/src/advanced_omi_backend/config.py

264-264: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


295-296: try-except-pass detected, consider logging the exception

(S110)


295-295: Do not catch blind exception: Exception

(BLE001)


343-343: Do not catch blind exception: Exception

(BLE001)


344-344: Use logging.exception instead of logging.error

Replace with exception

(TRY400)


359-359: Do not catch blind exception: Exception

(BLE001)

🔇 Additional comments (4)
backends/advanced/src/advanced_omi_backend/services/memory/config.py (1)

66-72: LGTM! Clean migration to centralized configuration.

The refactoring to use get_config() aligns well with the centralized configuration system introduced in this PR. The function now returns the full merged configuration (defaults.yml + config.yml + env vars) as documented.

tests/endpoints/system_admin_tests.robot (1)

152-181: Excellent test coverage for configuration override behavior.

This test effectively validates that config.yml values correctly override defaults.yml values, which is a critical aspect of the hierarchical configuration system introduced in this PR.

tests/integration/integration_test.robot (1)

198-212: Excellent segment structure validation!

The explicit validation of segment structure (required keys, non-empty text, timing consistency) provides robust test coverage for the segment functionality fixes introduced in this PR. This prevents regressions and ensures segments are properly formatted.

config/config.yml.template (1)

1-195: Well-structured template with clear guidance.

The template effectively balances being a working example (with concrete defaults section) and providing comprehensive commented examples for custom configurations. The header comments clearly explain the relationship with defaults.yml and how to customize.

Comment on lines +269 to +304
def _find_config_path() -> Path:
"""Find config.yml in expected locations.
Search order:
1. CONFIG_FILE environment variable
2. /app/config/config.yml (Docker container with config directory mount)
3. Current working directory
4. Walk up from module directory
Returns:
Path to config.yml (may not exist)
"""
# ENV override
cfg_env = os.getenv("CONFIG_FILE")
if cfg_env and Path(cfg_env).exists():
return Path(cfg_env)

# Common locations (container with config dir mount vs repo root)
candidates = [Path("/app/config/config.yml"), Path("config.yml")]

# Also walk up from current file's parents defensively
try:
for parent in Path(__file__).resolve().parents:
c = parent / "config.yml"
if c.exists():
return c
except Exception:
pass

for c in candidates:
if c.exists():
return c

# Last resort: return /app/config/config.yml path (may not exist yet)
return Path("/app/config/config.yml")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

CONFIG_FILE override is ignored if the file doesn’t exist (can write/read the wrong config).

_find_config_path() currently does:

  • if cfg_env and Path(cfg_env).exists(): return Path(cfg_env)

If CONFIG_FILE is set to a new path (common for “first-run” or UI-created config), your write paths (get_config_path() callers) will not use it and may create/modify /app/config/config.yml instead.

Proposed fix
-    cfg_env = os.getenv("CONFIG_FILE")
-    if cfg_env and Path(cfg_env).exists():
-        return Path(cfg_env)
+    cfg_env = os.getenv("CONFIG_FILE")
+    if cfg_env:
+        return Path(cfg_env)
🧰 Tools
🪛 Ruff (0.14.10)

295-296: try-except-pass detected, consider logging the exception

(S110)


295-295: Do not catch blind exception: Exception

(BLE001)

🤖 Prompt for AI Agents
In @backends/advanced/src/advanced_omi_backend/config.py around lines 269 - 304,
_find_config_path currently ignores the CONFIG_FILE env var when the file
doesn't yet exist, causing writes to fall back to /app/config/config.yml; change
the initial ENV override in _find_config_path to return Path(cfg_env) whenever
CONFIG_FILE is set (i.e., if cfg_env: return Path(cfg_env)) so callers like
get_config_path will honor a user-provided path even on first-run, leaving the
rest of the lookup logic unchanged.

Comment on lines 129 to +146
# DEBUG: Log Deepgram response structure
if "results" in data and "channels" in data.get("results", {}):
channels = data["results"]["channels"]
results = data["results"]
logger.info(f"DEBUG Registry: Deepgram results keys: {list(results.keys())}")

channels = results.get("channels", [])
if channels and "alternatives" in channels[0]:
alt = channels[0]["alternatives"][0]
logger.info(f"DEBUG Registry: Deepgram alternative keys: {list(alt.keys())}")

# Check if utterances exist at results level
if "utterances" in results:
logger.info(f"DEBUG Registry: Found utterances at results level: {len(results['utterances'])} utterances")
if results['utterances']:
logger.info(f"DEBUG Registry: First utterance: {results['utterances'][0]}")
else:
logger.warning(f"DEBUG Registry: NO utterances found in results! Available keys: {list(results.keys())}")

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid info-level logging of utterance payloads (PII/log volume); keep normalization.

The transcript -> text normalization looks good, but logging the first utterance object at INFO is risky (may include transcript text) and noisy. Switch to logger.debug and log only counts/keys (or truncate).

Safer logging change
-                logger.info(f"DEBUG Registry: Deepgram results keys: {list(results.keys())}")
+                logger.debug("Registry: Deepgram results keys: %s", list(results.keys()))
@@
-                    logger.info(f"DEBUG Registry: Deepgram alternative keys: {list(alt.keys())}")
+                    logger.debug("Registry: Deepgram alternative keys: %s", list(alt.keys()))
@@
-                    logger.info(f"DEBUG Registry: Found utterances at results level: {len(results['utterances'])} utterances")
-                    if results['utterances']:
-                        logger.info(f"DEBUG Registry: First utterance: {results['utterances'][0]}")
+                    logger.debug(
+                        "Registry: Found utterances at results level: %s utterances",
+                        len(results["utterances"]),
+                    )
@@
-                    logger.warning(f"DEBUG Registry: NO utterances found in results! Available keys: {list(results.keys())}")
+                    logger.warning("Registry: No utterances in results. Available keys: %s", list(results.keys()))

Also applies to: 155-168

method: POST
path: /listen
headers:
Authorization: Token ${DEEPGRAM_API_KEY:-}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Authorization header will be malformed if DEEPGRAM_API_KEY is unset.

When DEEPGRAM_API_KEY is not set, the header becomes Authorization: Token (with a trailing space but no token), which will cause Deepgram API calls to fail silently with authentication errors.

Consider either:

  1. Removing the default empty value to force explicit configuration: Token ${DEEPGRAM_API_KEY}
  2. Adding validation that checks for the presence of required API keys before attempting to use the service
🤖 Prompt for AI Agents
In @config/defaults.yml at line 51, The Authorization header in
config/defaults.yml currently uses a fallback empty value ("Authorization: Token
${DEEPGRAM_API_KEY:-}") which yields a malformed header when DEEPGRAM_API_KEY is
unset; update the config to require the variable (e.g., "Authorization: Token
${DEEPGRAM_API_KEY}") and/or add runtime validation where Deepgram is used
(search for references to DEEPGRAM_API_KEY and the Deepgram client
initialization) to detect a missing key and fail early with a clear error
message before making API calls.

Should Not Be Empty ${prompt}
Should Not Contain ${prompt} system_prompt: msg=Should not contain YAML key
Should Contain ${prompt} helpful AI assistant msg=Should contain default prompt content
Should Contain ${prompt} specialized AI assistant msg=Should contain default prompt content
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Search for the default chat prompt configuration
echo "=== Searching for chat configuration in defaults.yml ==="
if [ -f "config/defaults.yml" ]; then
  cat config/defaults.yml | grep -A 10 "^chat:" || echo "No chat section found in defaults.yml"
fi

echo ""
echo "=== Searching for chat system prompt in codebase ==="
rg -n "specialized AI assistant" -C 2

echo ""
echo "=== Searching for default chat prompts ==="
rg -n "system.*prompt|default.*prompt" --type yaml -C 2

Repository: SimpleOpenSoftware/chronicle

Length of output: 1242


🏁 Script executed:

#!/bin/bash
# Search for default prompt in Python files
echo "=== Searching for default prompt in Python code ==="
rg -n "prompt|default" --type py -i | grep -i "ai assistant\|prompt.*default\|default.*prompt" | head -20

echo ""
echo "=== Looking for the chat config endpoint implementation ==="
rg -n "admin/chat/config" --type py

echo ""
echo "=== Searching all files for 'specialized AI assistant' or similar patterns ==="
rg -i "helpful.*assistant|default.*prompt" -C 2 | head -40

echo ""
echo "=== Checking for environment variables or settings ==="
rg -n "SYSTEM_PROMPT|CHAT_PROMPT" --type py

echo ""
echo "=== Finding where chat configuration is initialized ==="
rg -n "chat.*config|ChatConfig" --type py -C 2 | head -30

Repository: SimpleOpenSoftware/chronicle

Length of output: 10917


Fix test assertion to match the actual default prompt text.

The test expects "specialized AI assistant" but the actual default chat prompt in backends/advanced/src/advanced_omi_backend/controllers/system_controller.py line 486 starts with "You are a helpful AI assistant with access to the user's personal memories and conversation history." Update the assertion to match the actual default.

🤖 Prompt for AI Agents
In @tests/endpoints/system_admin_tests.robot at line 194, Test assertion expects
"specialized AI assistant" but the actual default prompt text differs; update
the Robot test to match the real default prompt. Edit the assertion in
tests/endpoints/system_admin_tests.robot (the Should Contain check) to look for
the actual default prompt beginning "You are a helpful AI assistant with access
to the user's personal memories and conversation history." (matching the default
prompt defined in SystemController / system_controller.py) so the test checks
the real prompt text instead of "specialized AI assistant".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants