-
Notifications
You must be signed in to change notification settings - Fork 0
Feature/tests and docs #6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
- test_hunter.py: 30 tests for Zombie Hunter (EBS cleanup) - test_watchman.py: 30 tests for Night Watchman (staging shutdown) - test_guard.py: 28 tests for Access Guard (IAM revocation) - Makefile.opsmate: Sequential container testing rig (no docker-compose) - Add normalize_email() for +suffix handling in Guard - Add create_initial_*_state() functions for each agent - Fix should_shutdown quiet_hours edge case - Fix execute_shutdown error handling All 380 tests pass (3 skipped - Neo4j unavailable) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Neo4j: neo4j:echoteam123 (port 7474/7687) Redis: echoteam-redis (port 6380) PostgreSQL: postgres-pgvector (port 5432) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add test_e2e_api.py: 23 E2E API tests with FastAPI TestClient - Add test_llm_eval.py: 12 LLM evaluation tests using DeepEval - Add test_llm_eval_quick.py: Quick smoke test for minimal hardware load - Update README with agent docs, test instructions, and new architecture 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add OpenRouter LLM client (httpx-based, OpenAI-compatible) - Add ExecOpsAgent with context awareness, proactivity, and learning - Add test_agent_api.py for testing agent decisions Features: - Context-aware decision making - Proactive suggestions - Learning from human feedback - Event processing for PR, security, and resource events 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add SentinelAgent for PR compliance analysis - Add HunterAgent for zombie resource detection - Add GuardAgent for access control and security - Add SupervisorAgent for parallel/sequential/hierarchical workflow orchestration - Implement agent-to-agent messaging with Task, Result, Alert, Query types - Add memory system for collective learning - Create comprehensive test suite with 11 tests 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Redis memory store with TTL and pattern recall - LLM fallback chain (OpenRouter -> Ollama -> Rules) - Circuit breaker for resilience - Decision evaluation framework with accuracy tracking - Real LLM evaluation tests with Ollama 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Create docs/ folder with ARCHITECTURE.md, GUIDES.md, IMPLEMENTATION_PLAN.md - Rename evals/ to deepeval_metrics/ for clarity (DeepEval-based metrics) - Remove agent/archive/ duplicate tech_debt.py - Update README with current architecture and project structure - Remove all __pycache__ folders - Update test_golden_cases.py import path 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Infrastructure (Redis, Neo4j, LocalStack, Ollama) - Unit tests (40+ test classes, 454 tests) - Integration tests (212 tests) - LLM evaluations (5 scenarios, real Ollama tests) - AWS integration (28 tests with moto) - Code coverage targets by module - Performance testing checklist - Security testing checklist - Docker environment verification - CI/CD pipeline quality gates - Bug fixes required (334 deprecation warnings) - Recommended improvements roadmap 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- test_evaluation.py: 39 tests (73% coverage, up from 45%) - DecisionRecord serialization - EvaluationMetrics accuracy calculation - DecisionStore Redis operations - AccuracyCalculator edge cases - MetricsCollector trends - EvaluationReporter reports - Health check function - test_vertical_agents.py: 29 tests (88% coverage) - ActionProposalState TypedDict - Vertical routing (all event types) - Graph factory functions - Human approval workflow - Checkpointer selection - test_sop.py: 15 tests (100% coverage) - SOPLoader initialization - Policy loading (found/not found) - Policy listing - Vector search fallback - Error handling Total: 83 new tests | 525 tests passing 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR introduces a complete architectural redesign of the AI service from a single Sentinel agent to a multi-agent orchestration system (ExecOps/EchoTeam), adding core agents (ExecOpsAgent, Sentinel, Hunter, Guard, Watchman), LLM resilience patterns (circuit breaker, fallback chains), Redis-based memory storage, evaluation metrics framework, and comprehensive test coverage with documentation. Changes
Sequence Diagram(s)sequenceDiagram
actor Event
participant Supervisor as SupervisorAgent
participant Router as Vertical Router
participant Agent as Specialized Agent<br/>(Sentinel/Hunter/Guard)
participant LLM as ResilientLLMClient
participant Circuit as CircuitBreaker
participant Fallback as Fallback Chain
participant Store as DecisionStore<br/>(Redis)
Event->>Supervisor: coordinate_agents(workflow)
Supervisor->>Router: route_event(event_type)
Router->>Agent: delegate_task(event)
Agent->>LLM: chat(messages)
LLM->>Circuit: allow_request()
alt Circuit CLOSED
Circuit-->>LLM: OK
LLM->>LLM: Call Primary LLM
alt Primary Success
LLM-->>Agent: response
else Primary Fails
LLM->>Fallback: execute_fallback()
Fallback->>Fallback: Try Secondary (Ollama)
Fallback-->>LLM: fallback_response
end
else Circuit OPEN
Circuit-->>LLM: CircuitOpenError
LLM->>Fallback: rule_based_fallback()
Fallback-->>LLM: heuristic_response
end
LLM-->>Agent: result
Agent->>Agent: process_result()
Agent->>Store: record_decision(decision)
Store-->>Agent: decision_id
Agent-->>Supervisor: TaskResult
Supervisor-->>Event: workflow_complete(results)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes Rationale: The PR introduces multiple interconnected systems (multi-agent orchestration, circuit breaker resilience, Redis memory, evaluation metrics) with 738+ lines in orchestration logic alone, 595+ lines in fallback patterns, 567+ lines in evaluation metrics, and significant async/await patterns across distributed components. While individual test files are relatively homogeneous, the diversity of production logic across agents, LLM infrastructure, memory stores, and orchestration demands separate reasoning for correctness of async flows, state management, and inter-component interactions. Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Summary of ChangesHello @Aparnap2, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the ExecOps AI Service by introducing a sophisticated multi-agent architecture, improving its resilience through advanced LLM fallback mechanisms, and providing agents with persistent memory capabilities. These foundational changes are supported by a comprehensive testing suite and detailed documentation, ensuring the system is robust, maintainable, and scalable for future development. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 13
Note
Due to the large number of review comments, Critical, Major severity comments were prioritized as inline comments.
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
ai-service/src/ai_service/deepeval_metrics/test_golden_cases.py (1)
11-11: Docstring references outdated file path.The docstring still references
src/ai_service/evals/test_golden_cases.py, but based on the import path change and the file location underdeepeval_metrics/, this should be updated.📝 Suggested fix
-Run with: pytest src/ai_service/evals/test_golden_cases.py -v +Run with: pytest src/ai_service/deepeval_metrics/test_golden_cases.py -v
🤖 Fix all issues with AI agents
In `@ai-service/docs/GUIDES.md`:
- Around line 143-165: Update the documentation examples to match the actual
API: include the required agent parameter when calling
DecisionStore.record_feedback (i.e., call record_feedback with decision_id,
agent, feedback, reasoning) and replace the non-existent
AccuracyCalculator.calculate(...) example with the real API—either show using
AccuracyCalculator.calculate_from_records(records) which accepts a list of
DecisionRecord objects, or use DecisionStore.get_accuracy(agent, days) for the
documented agent/time-window use case; reference DecisionStore.record_feedback,
AccuracyCalculator.calculate_from_records, and DecisionStore.get_accuracy when
making the changes.
In `@ai-service/Makefile.opsmate`:
- Around line 67-74: The redis Makefile target has a malformed docker run
command: the port variable is mistyped as $(REDIS_PORT9 and the
image/tokenization is corrupted ("-p $(REDIS_PORT9 \n\t):637redis:alpine"). Fix
the redis target (function/target name: redis) to use the correct port variable
$(REDIS_PORT), ensure the -p mapping is "-p $(REDIS_PORT):6379", and restore the
image reference to "redis:alpine"; keep the existing references to
$(REDIS_CONTAINER) and $(NETWORK_NAME) and preserve the 2>/dev/null || echo
fallback behavior.
In `@ai-service/src/ai_service/agents/execops_agent.py`:
- Around line 124-139: The _parse_decision function can raise ValueError when
ActionType(parsed.get("action", "approve").lower()) gets an unexpected value;
update _parse_decision to validate or catch errors when converting to
ActionType: extract the raw_action = parsed.get("action", "approve"), normalize
it (e.g., raw_action.lower()), then attempt to map/convert to ActionType inside
a try/except (catch ValueError or KeyError) and on failure set action =
ActionType.APPROVE (safe fallback); finally construct and return the
AgentDecision with that safe action while keeping the existing reasoning,
confidence, context_summary, and suggestions logic.
In `@ai-service/src/ai_service/agents/multi_agent.py`:
- Around line 592-600: The test calls a non-existent learn_from_feedback API and
feedback_patterns on HunterAgent, causing AttributeError; add these to the
Agent/HunterAgent implementation by introducing a feedback_patterns attribute
(e.g., self.feedback_patterns: Dict[str, List]) on Agent and implement an async
learn_from_feedback(self, decision_id: str, feedback: str, suggestion: str |
None) method that updates feedback_patterns (keyed by decision_id) and persists
or processes feedback as HunterAgent expects; alternatively, adjust
test_collective_memory to stop calling learn_from_feedback and not inspect
feedback_patterns if you prefer not to change Agent.
In `@ai-service/src/ai_service/evaluation/metrics.py`:
- Around line 263-266: In get_feedback_rate, the call to _get_client() is a
NameError because the helper is an instance method; change the call to use
self._get_client() inside async def get_feedback_rate(self, agent: str) -> float
to obtain the client. Locate get_feedback_rate in metrics.py and replace the
bare _get_client() invocation with self._get_client(), ensuring any awaited
usage remains (await self._get_client()) and no other references to the unscoped
function remain in this method.
- Around line 211-218: get_accuracy currently expects a concrete agent name and
builds a key pattern using f"{self._prefix}{agent}:*", so when generate_report
passes the string "all" no real agent keys match and accuracy is 0; change
get_accuracy signature to accept agent: Optional[str] = None and when agent is
None build the pattern as f"{self._prefix}*"(scan all agents) otherwise
f"{self._prefix}{agent}:*"; also update generate_report to pass None instead of
the literal "all" when it intends to aggregate across all agents so the new
pattern will match all decision keys; keep references to get_accuracy and
generate_report when making these changes.
In `@ai-service/src/ai_service/llm/openrouter.py`:
- Around line 66-92: In chat_stream, handle the stream sentinel and fix
attribute access: first check if the line after "data: " equals "[DONE]" and
break/return to end the generator before attempting JSON parsing; otherwise
parse the JSON and access the delta as a dict (e.g., use
data["choices"][0].get("delta", {}) and then .get("content", "")) instead of
data["choices"][0].delta.get(...); keep the existing checks for "choices" and
only yield non-empty content. Ensure these changes are applied inside the
chat_stream async context where the stream lines are iterated.
In `@ai-service/src/ai_service/memory/redis_store.py`:
- Around line 195-202: In retrieve(), preserve the key TTL when incrementing
MemoryItem.access_count: after fetching the item with client.get(full_key) and
before overwriting it, read the remaining TTL (e.g., await client.pttl(full_key)
or await client.ttl(full_key)), update item.access_count, then write the updated
value and restore the TTL—either use client.setex(full_key, seconds, ...) /
client.psetex with the remaining TTL or call client.set(...) followed by
client.pexpire(full_key, remaining_ms) if pttl returned milliseconds; handle
special pttl return values (-1 = no expiry, -2 = key missing) so you only
restore expiry when appropriate. This keeps create()'s setex-based TTL behavior
intact.
- Around line 349-365: The sync wrapper remember() uses asyncio.create_task()
which raises RuntimeError when no running loop; update remember() to follow the
same safe pattern used in recall(): attempt to get the current event loop and
call loop.run_until_complete(self._save_memory(key, value, ttl_seconds)) when a
loop is available, otherwise call asyncio.run(self._save_memory(...)); apply the
same change to forget() to call self._delete_memory(...) via run_until_complete
or asyncio.run as appropriate; reference functions: remember(), forget(),
_save_memory(), _delete_memory(), and recall() to locate and mirror the safe
event-loop handling.
In `@ai-service/tests/test_agent_api.py`:
- Around line 92-100: The curl command construction incorrectly slices the JSON
string (json_body[1:]) which removes the opening '{' and yields malformed JSON;
update the print that builds the curl -d payload to use the full json_body
variable (or otherwise preserve the complete JSON) in the block where json_body
is defined and used in the print statements so the -d argument contains valid
JSON.
- Around line 93-103: The json module is imported inside the if __name__ ==
"__main__" block but used earlier in main() when building json_body, causing a
NameError; move the import json to the top-level imports (so main() and any
top-level code can use it), and remove the trailing import json from the bottom
of the file (keep the existing main(), json_body construction, and the if
__name__ block unchanged aside from removing that bottom import).
In `@ai-service/tests/test_e2e_api.py`:
- Around line 172-208: The first test (test_github_webhook_validates_signature)
uses a too-broad assertion; change it to assert response.status_code == 200 to
reflect test env behavior when GITHUB_WEBHOOK_SECRET is not set and
verify_signature returns True. For the webhook handler, add explicit validation
before calling ainvoke_sentinel: ensure payload contains "pull_request" and that
pull_request.get("number") is present; if missing, log a clear warning and
return a 400 response instead of proceeding with pr_number=None so
ainvoke_sentinel is never called with incomplete data. Apply these changes by
updating the test function and the GitHub webhook handler (the code that reads
pull_request/pr_number and calls ainvoke_sentinel and verify_signature).
In `@ai-service/tests/test_vertical_agents.py`:
- Around line 298-312: The test calls create_vertical_agent_graph with
use_postgres=True which triggers the real Postgres checkpointer; update the test
to mock ai_service.infrastructure.checkpointer.get_sync_checkpointer (the
function used by create_vertical_agent_graph) so it returns a MagicMock
checkpointer, or alternatively set the environment flag
USE_POSTGRES_CHECKPOINTER to "false" before invoking
create_vertical_agent_graph; ensure you still patch get_vertical_graph and/or
langgraph.checkpoint.memory.MemorySaver as needed and return mock objects so the
test remains hermetic.
🟡 Minor comments (20)
ai-service/tests/test_sop.py-178-186 (1)
178-186: Avoid mockingPathhere; it can mask default regressions.
This patch makes the test pass even if the default path changes, so it won’t catch a regression.✅ Suggested fix
- with patch('ai_service.sop.loader.Path') as mock_path: - mock_path.return_value = Path("data/sops") - loader = get_sop_loader() - - assert loader.sop_dir == Path("data/sops") + loader = get_sop_loader() + assert loader.sop_dir == Path("data/sops")ai-service/docs/TESTING_CHECKLIST.md-52-52 (1)
52-52: Typo:describe-volesshould bedescribe-volumes.Proposed fix
-| EC2 | ✅ Available | `aws ec2 describe-voles --endpoint-url http://localhost:4566` | +| EC2 | ✅ Available | `aws ec2 describe-volumes --endpoint-url http://localhost:4566` |ai-service/tests/test_llm_eval_quick.py-6-6 (1)
6-6: Module-level print executes during pytest collection.This
loggingwith appropriate level.Proposed fix
import os OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "granite4:1b-h") -print(f"Testing LLM: {OLLAMA_MODEL}") - def test_geval_basic(): """Basic GEval test with lenient criteria.""" + print(f"Testing LLM: {OLLAMA_MODEL}") from deepeval.test_case import LLMTestCase, LLMTestCaseParamsai-service/README.md-24-31 (1)
24-31: Surround tables with blank lines
Adds clarity and satisfies MD058 table linting.Also applies to: 194-200, 229-235
ai-service/README.md-7-20 (1)
7-20: Add language identifiers to fenced code blocks
Markdownlint flags these fences without a language; considertextfor diagrams andbash/textfor structure blocks.Also applies to: 70-74, 87-134
ai-service/tests/test_vertical_agents.py-132-139 (1)
132-139: Renametest_route_unknown_returns_noneto match actual behavior
This test asserts aValueError, so the name/docstring imply the wrong outcome.💡 Suggested rename
- def test_route_unknown_returns_none(self): - """Test routing returns None for unknown events.""" + def test_route_unknown_raises(self): + """Test routing raises ValueError for unknown events."""ai-service/README.md-162-165 (1)
162-165: Avoid user-specific absolute paths in run instructions
Using a repo-relative placeholder is more portable for readers.✏️ Suggested edit
-cd /home/aparna/Desktop/founder_os/ai-service +cd /path/to/ai-serviceai-service/tests/test_fallback.py-78-98 (1)
78-98: Usingtime.sleepin async test blocks the event loop.
time.sleep(0.2)at Line 89 is a blocking call inside an async function. Useawait asyncio.sleep(0.2)instead.Use async sleep
async def run(): await cb.record_failure() # Opens circuit assert cb.state == CircuitState.OPEN # Wait for timeout - time.sleep(0.2) + await asyncio.sleep(0.2) # Try to allow request - should go to half_open result = await cb.allow_request() assert cb.state == CircuitState.HALF_OPEN return Trueai-service/docs/ARCHITECTURE.md-225-246 (1)
225-246: Security concern: Example shows hardcoded credentials in documentation.The example environment variables include
NEO4J_PASSWORD=echoteam123which might be an actual password. Documentation should use placeholder values likeyour-password-here.Use placeholder credentials
# Database NEO4J_URI=bolt://localhost:7687 NEO4J_USER=neo4j -NEO4J_PASSWORD=echoteam123 +NEO4J_PASSWORD=your-secure-password REDIS_URL=redis://localhost:6380 -DATABASE_URL=postgresql://postgres:postgres@localhost:5432/postgres +DATABASE_URL=postgresql://user:password@localhost:5432/dbname # LLM -OPENROUTER_API_KEY=sk-or-v1-xxx +OPENROUTER_API_KEY=your-openrouter-api-keyai-service/src/ai_service/llm/fallback.py-339-346 (1)
339-346: Bareexceptsilently swallows all exceptions includingKeyboardInterrupt.The bare
except: passin_extract_contexthides parsing errors, making debugging difficult. At minimum, catch(json.JSONDecodeError, ValueError)and consider logging.Improve exception handling
try: import json start = content.find("{") if start >= 0: data = json.loads(content[start:]) context.update(data) - except: - pass + except (json.JSONDecodeError, ValueError): + # Content doesn't contain valid JSON - that's OK for fallback + pass breakai-service/docs/ARCHITECTURE.md-266-269 (1)
266-269: Docker commands use specific passwords that should be placeholder values.Use placeholder in example commands
# Start infrastructure docker run -d --name echoteam-neo4j -p 7687:7687 -p 7474:7474 \ - -e NEO4J_AUTH=neo4j/echoteam123 neo4j:5.14 + -e NEO4J_AUTH=neo4j/your-secure-password neo4j:5.14ai-service/docs/GUIDES.md-119-141 (1)
119-141: Guide references non-existent API:breaker.call()and@with_circuit_breaker.The
CircuitBreakerclass infallback.pydoesn't have acall()method, and the decorator is namedwith_resilience, notwith_circuit_breaker. This will confuse developers.Fix to match actual API
## Circuit Breaker Usage ### Manual Control ```python -from ai_service.llm.fallback import CircuitBreaker, CircuitState +from ai_service.llm.fallback import CircuitBreaker, CircuitOpenError breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=30.0) -try: - await breaker.call(my_function) - await breaker.record_success() -except Exception: - await breaker.record_failure() +# Check if request is allowed +try: + await breaker.allow_request() + result = await my_function() + await breaker.record_success() +except CircuitOpenError: + # Circuit is open, use fallback + pass +except Exception: + await breaker.record_failure() + raiseUsing Decorator
-from ai_service.llm.fallback import with_circuit_breaker +from ai_service.llm.fallback import with_resilience -@with_circuit_breaker(breaker) +@with_resilience(fallbacks=[(fallback_func, 2)]) async def risky_operation(): return await external_service_call()</details> </blockquote></details> <details> <summary>ai-service/src/ai_service/llm/fallback.py-308-308 (1)</summary><blockquote> `308-308`: **Type annotation mismatch: parameter typed as `str` but receives `List[Dict]`.** The `_rule_based_fallback` method declares `messages: str` but it's called with `List[Dict]` from `chat()` at Line 306. This type mismatch could cause runtime issues or confuse type checkers. <details> <summary>Fix type annotation</summary> ```diff - async def _rule_based_fallback(self, messages: str) -> Dict[str, Any]: + async def _rule_based_fallback(self, messages: List[Dict]) -> Dict[str, Any]:ai-service/src/ai_service/llm/fallback.py-489-518 (1)
489-518:with_resiliencedecorator assumes all decorated functions are async.Line 508 unconditionally
awaits the decorated function, but the decorator doesn't check iffuncis a coroutine function. This will fail for sync functions.Handle both sync and async functions
def decorator(func): circuit = CircuitBreaker() strategy = FallbackStrategy(fallbacks or []) async def wrapper(*args, **kwargs): await circuit.allow_request() try: - result = await func(*args, **kwargs) + if asyncio.iscoroutinefunction(func): + result = await func(*args, **kwargs) + else: + result = func(*args, **kwargs) await circuit.record_success() return result except Exception as e: await circuit.record_failure() if strategy.fallbacks: return await strategy.execute() raise return wrapper return decoratorai-service/src/ai_service/agents/multi_agent.py-148-152 (1)
148-152: Bareexceptsilently swallows JSON parsing errors with default fallback.Multiple agent methods (Lines 151, 203, 255) use bare
exceptwhen parsing LLM responses. This hides the actual error. Catch specific exceptions and consider logging for debugging.Improve exception handling
try: parsed = json.loads(result["content"][result["content"].find("{"):result["content"].rfind("}")+1]) return parsed - except: + except (json.JSONDecodeError, KeyError, TypeError) as e: + logger.warning(f"Failed to parse LLM response: {e}") return {"decision": "approve", "reasoning": "Default approval", "risk_level": "low"}ai-service/tests/test_agent_api.py-77-81 (1)
77-81: PotentialKeyErrorif decision not found.Per the
learn_from_feedbackAPI, if the decision_id is not found, it returns{"error": "Decision not found"}without a'learned'key. This could cause aKeyError.💡 Suggested defensive access
feedback = await agent.learn_from_feedback( decision_id=decision.decision_id, feedback="approved", ) - print(f"Feedback learned: {feedback['learned']}") + print(f"Feedback learned: {feedback.get('learned', False)}")ai-service/tests/test_multi_agent.py-54-68 (1)
54-68: Unused variableguard.The
guardagent is created but never used. Either remove it or add assertions that verify the guard receives/processes the message:Proposed fix
def test_agent_messaging(): """Test agent-to-agent messaging.""" sentinel = SentinelAgent() - guard = GuardAgent() message = asyncio.run(sentinel.send_message( to_agent=AgentRole.GUARD,Or, if you want to test message receipt:
assert message.from_agent == "sentinel" assert message.to_agent == "guard" + # Verify guard can receive the message + response = asyncio.run(guard.receive_message(message)) + assert response is not None print("Messaging: Message sent correctly")ai-service/tests/test_hunter.py-126-143 (1)
126-143: Timeout mock may not work as intended.The
timeout_mockis defined asasync defbutaws_client.ec2.describe_volumesis a synchronous boto3 method. The async function assigned to it won't be awaited, so it will return a coroutine object instead of raisingTimeoutError.Looking at the
scan_zombiesimplementation innodes.py, it callsawait aws_client.get_zombie_volumes()which internally usesasyncio.to_thread(self.ec2.describe_volumes, ...). The mock should target the correct level.Proposed fix using proper patching
`@pytest.mark.asyncio` async def test_scan_zombies_edge_case_timeout(self, aws_client): """Test handling API timeout (Edge Case per checklist).""" - # Mock a timeout scenario by patching the ec2 client - original_describe_volumes = aws_client.ec2.describe_volumes - - async def timeout_mock(*args, **kwargs): - raise TimeoutError("API timeout") - - aws_client.ec2.describe_volumes = timeout_mock - - result = await scan_zombies(aws_client) - - # Should handle error gracefully - assert "error_message" in result - assert "timeout" in result["error_message"].lower() or result["error_message"] is not None - - # Restore for cleanup - aws_client.ec2.describe_volumes = original_describe_volumes + with patch.object(aws_client, 'get_zombie_volumes', side_effect=TimeoutError("API timeout")): + result = await scan_zombies(aws_client) + + # Should handle error gracefully + assert "error_message" in result + assert "timeout" in result["error_message"].lower()ai-service/src/ai_service/agents/execops_agent.py-124-124 (1)
124-124: Remove or mark the unusedevent_dataparameter.Line 124’s
event_dataisn’t used; Ruff flags this. Consider renaming to_event_dataor removing it from the signature and callsite.ai-service/tests/test_guard.py-610-610 (1)
610-610: Remove or intentionally ignore the unused IAM scan result.Line 610 assigns
iam_usersbut it’s never used, which will trigger Ruff F841. Use_ = ...or assert on it.♻️ Proposed fix
- iam_users = await scan_iam_users(aws_client) + _ = await scan_iam_users(aws_client)
🧹 Nitpick comments (33)
ai-service/.deepeval/.temp_test_run_data.json (1)
1-1: Consider excluding auto-generated deepeval artifacts from version control.This appears to be a temporary file generated by the deepeval framework during test runs. Committing such files can lead to noisy diffs and merge conflicts. Consider adding
.deepeval/to.gitignore.ai-service/.deepeval/.latest_test_run.json (1)
1-1: Auto-generated test artifact with failing test result - consider excluding from VCS.This file captures a failed test run (score 0.5 < threshold 0.7). Similar to other
.deepeval/files, this appears to be a runtime-generated artifact. Committing test run results can clutter history and may inadvertently expose test environment details.Consider adding the entire
.deepeval/directory to.gitignore.ai-service/docs/TESTING_CHECKLIST.md (2)
484-496: Consider using placeholder credentials in documentation examples.While these are test credentials, documenting actual passwords (even for local testing) can lead to accidental reuse or copy-paste into production configurations. Consider using generic placeholders like
<password>with a note to refer to.env.test.Proposed change
# Neo4j -curl http://localhost:7474 -u neo4j:echoteam123 +curl http://localhost:7474 -u neo4j:<NEO4J_PASSWORD>
199-205: Add language identifier to fenced code block for consistency.The fallback chain diagram could use a language identifier (e.g.,
textorplaintext) for consistency with other code blocks in the document.Proposed fix
-``` +```text Primary: OpenRouter API ↓ (rate limit / error)ai-service/tests/test_llm_eval.py (2)
23-47: Surface HTTP errors explicitly inquery()
Callingraise_for_status()improves failure visibility before JSON parsing.🔧 Suggested tweak
response = await client.post( "http://localhost:11434/api/generate", json={ "model": self.model, @@ }, timeout=120.0, ) + response.raise_for_status() data = response.json()
76-91: Replace bareexceptin JSON parsing blocks
Catchingjson.JSONDecodeErrorkeeps parsing fallbacks while letting unexpected errors surface (or be logged).🔧 Example pattern
- except: + except json.JSONDecodeError: # Fallback: extract keywords from text content_lower = content.lower()Also applies to: 132-146, 185-192, 228-235, 272-279
ai-service/src/ai_service/llm/openrouter.py (1)
16-20: Align env var names with docs / OpenRouter conventions
README referencesOPENROUTER_*vars while the client readsOPENAI_*. A fallback prevents misconfiguration.🔧 Suggested fallback mapping
-OPENAI_API_KEY = _get_env("OPENAI_API_KEY") -OPENAI_BASE_URL = _get_env("OPENAI_BASE_URL", "https://openrouter.ai/api/v1/chat/completions") -OPENAI_MODEL = _get_env("OPENAI_MODEL_NAME", "liquid/lfm-2.5-1.2b-thinking:free") +OPENAI_API_KEY = _get_env("OPENAI_API_KEY") or _get_env("OPENROUTER_API_KEY") +OPENAI_BASE_URL = _get_env("OPENAI_BASE_URL") or _get_env( + "OPENROUTER_BASE_URL", "https://openrouter.ai/api/v1/chat/completions" +) +OPENAI_MODEL = _get_env("OPENAI_MODEL_NAME") or _get_env( + "OPENROUTER_MODEL", "liquid/lfm-2.5-1.2b-thinking:free" +)ai-service/src/ai_service/memory/redis_store.py (1)
210-233: ConsiderSCANoverKEYSfor pattern recall/clear
KEYSis O(N) and can block Redis under load;scan_iteris safer for production.🔧 Example pattern for recall
- keys = await client.keys(full_pattern) results = [] - - for key in keys: + async for key in client.scan_iter(full_pattern): data = await client.get(key) if data: item = MemoryItem.from_dict(json.loads(data)) results.append(item.value)Also applies to: 271-275
ai-service/src/ai_service/llm/fallback.py (6)
105-129:allow_requestmay throwCircuitOpenErrordespite docstring promisingboolreturn.The docstring says "Returns: True if request allowed, raises CircuitOpenError if not" which is accurate but the return type annotation
-> boolis misleading. The method can raise an exception, so callers might not expect it. Additionally, Line 125 has an inconsistent error message usingself.state.valuewhich will always be "open" at that point.Suggested improvements
- async def allow_request(self) -> bool: + async def allow_request(self) -> bool: """Check if request should be allowed. Returns: - True if request allowed, raises CircuitOpenError if not + True if request is allowed + + Raises: + CircuitOpenError: If circuit is open and recovery timeout hasn't passed """ async with self._lock: if self.state == CircuitState.CLOSED: return True if self.state == CircuitState.OPEN: # Check if recovery timeout has passed if self.last_failure_time: elapsed = time.time() - self.last_failure_time if elapsed >= self.recovery_timeout: self.state = CircuitState.HALF_OPEN self.success_count = 0 logger.info("Circuit breaker HALF_OPEN (testing recovery)") return True raise CircuitOpenError( - f"Circuit breaker OPEN for {self.state.value}" + f"Circuit breaker is OPEN (recovery timeout: {self.recovery_timeout}s)" ) # HALF_OPEN - allow single request return True
186-193: Backoff formula is linear, not exponential.The sleep time
0.1 * (attempt + 1)results in 0.1s, 0.2s, 0.3s... which is linear progression. For exponential backoff, use0.1 * (2 ** attempt)which gives 0.1s, 0.2s, 0.4s...Fix exponential backoff
- await asyncio.sleep(0.1 * (attempt + 1)) # Exponential backoff + await asyncio.sleep(0.1 * (2 ** attempt)) # Exponential backoff
240-246: Use explicitOptionaltype annotation per PEP 484.The
model: str = Noneparameter should useOptional[str]orstr | Nonefor clarity.Fix type annotation
async def chat( self, messages: List[Dict[str, str]], - model: str = None, + model: Optional[str] = None, temperature: float = 0.7, max_tokens: int = 1000, use_fallback: bool = True, ) -> Dict[str, Any]:
341-343: Naive JSON extraction may fail on nested structures or multiple JSON objects.Using
content.find("{")without finding the matching closing brace can fail to parse valid JSON that doesn't extend to the end of the string, or may include trailing garbage. Consider finding the matching}or using a more robust extraction.
475-482: Unusedcontextparameter inrule_based_default.The
contextparameter is never used. Either use it or remove it for clarity.Use context or mark as unused
-def rule_based_default(context: Dict) -> Dict[str, Any]: +def rule_based_default(context: Dict) -> Dict[str, Any]: # noqa: ARG001 """Default rule-based response.""" return { "action": "escalate", - "reason": "Unable to process automatically", + "reason": f"Unable to process automatically: {context.get('event_type', 'unknown')}", "confidence": 0.3, "source": "rule_based", }
529-532: Bareexcepthides the cause of Redis health check failures.Consider catching specific exceptions and logging for observability.
Improve exception handling
try: redis_ok = await RedisClientSingleton.health_check() - except: + except Exception as e: + logger.warning(f"Redis health check failed: {e}") redis_ok = Falseai-service/tests/test_fallback.py (2)
6-7:sys.path.insertis fragile; prefer proper package installation.Manipulating
sys.pathcan cause import issues and is generally discouraged. Consider usingpip install -e .(editable install) or configuringpyproject.tomlwith proper test paths.
150-154: Usepytest.raisesorAssertionErrorinstead ofassert False.
assert Falseis removed when Python runs with-Ooptimization flag. Useraise AssertionError()or preferablypytest.raises.Use proper assertion pattern
+ import pytest + def test_no_fallback_raises_error(self): """Test when all fallbacks fail.""" from ai_service.llm.fallback import FallbackStrategy, FallbackError async def fail(): raise Exception("Always fails") strategy = FallbackStrategy([(fail, 1)]) - async def run_test(): - await strategy.execute() - - try: - asyncio.run(run_test()) - assert False, "Should have raised FallbackError" - except FallbackError: - print("No fallback raises error: PASS") + with pytest.raises(FallbackError): + asyncio.run(strategy.execute()) + print("No fallback raises error: PASS")ai-service/src/ai_service/agents/multi_agent.py (3)
51-51: Use explicitOptionaltype for nullable fields.
correlation_id: str = Noneshould becorrelation_id: Optional[str] = Noneper PEP 484.Fix type annotation
`@dataclass` class AgentMessage: """Agent-to-agent message.""" id: str from_agent: str to_agent: str message_type: MessageType content: Dict[str, Any] timestamp: str = field(default_factory=lambda: datetime.now().isoformat()) - correlation_id: str = None + correlation_id: Optional[str] = None
336-341: Loop variableiis unused; use_to indicate intentional discard.Rename unused variable
parallel_results = await asyncio.gather(*coroutines, return_exceptions=True) - for i, r in enumerate(parallel_results): + for _i, r in enumerate(parallel_results): if isinstance(r, Exception): results.append({"error": str(r)}) else: results.append(r)Or simply:
- for i, r in enumerate(parallel_results): + for r in parallel_results:
525-526: f-strings without placeholders; removefprefix.Remove extraneous f prefix
- print(f"Primary agent: sentinel") - print(f"Secondary agents: hunter, guard") + print("Primary agent: sentinel") + print("Secondary agents: hunter, guard")ai-service/docs/IMPLEMENTATION_PLAN.md (3)
5-27: Add language specifier to ASCII diagram code block.The fenced code block should specify a language (e.g.,
textorplaintext) for consistent rendering and to satisfy linters.Add language specifier
-``` +```text ┌─────────────────────────────────────────────────────────────────┐ │ FastAPI Application │
48-74: Add language specifier to file structure code block.Add language specifier
-``` +```text src/ai_service/ ├── agents/
84-87: Add language specifier to environment variables code block.Add language specifier
## Environment Variables (already in .env) +```bash - `USE_REDIS_CHECKPOINTER=true` - `AGENT_LEARNING_ENABLED=true` - `AGENT_PROACTIVE_SCAN_INTERVAL=300` +```Or keep as bullet list without code block formatting.
ai-service/tests/test_redis_memory.py (3)
270-304: Consider using pytest for better test infrastructure.The custom test runner works but lacks features like test isolation, better error reporting, and fixture support. Consider migrating to pytest which would also provide better CI integration.
99-115: Misleading mock setup forsetextest.The test sets up
mock_client.setbut asserts onmock_client.setex. WhileAsyncMockauto-creates the method, explicitly mockingsetexwould be clearer.💡 Suggested fix
# Mock Redis client mock_client = AsyncMock() - mock_client.set = AsyncMock(return_value=True) + mock_client.setex = AsyncMock(return_value=True) store._client = mock_client
132-147: Test could verify access_count increment.Per the
RedisMemoryStore.retrieveimplementation, the access count should be incremented and written back. Consider adding assertions for this behavior for more thorough coverage.ai-service/tests/test_hunter.py (1)
13-15: Consider using pytest'spythonpathconfiguration instead ofsys.pathmanipulation.Direct
sys.pathmanipulation in test files can be fragile and may cause import issues in CI environments. Consider configuringpythonpathinpyproject.tomlorpytest.ini:[tool.pytest.ini_options] pythonpath = ["src"]This applies to all test files in this PR that use the same pattern.
ai-service/tests/test_multi_agent.py (2)
6-7: Hardcoded path may be fragile.Using
sys.path.insert(0, 'src')with a hardcoded relative path assumes the test is always run from the repository root. Consider using dynamic path resolution like other test files in this PR:+from pathlib import Path -sys.path.insert(0, 'src') +_src_dir = Path(__file__).resolve().parents[1] / "src" +sys.path.insert(0, str(_src_dir))
197-219: Broad exception handling in test runner.While catching bare
Exceptionis flagged by static analysis, in a manual test runner context it's acceptable to catch broadly for reporting purposes. However, consider logging the full traceback for better debugging:except Exception as e: - print(f"\n{name}: ERROR - {e}") + import traceback + print(f"\n{name}: ERROR - {e}") + traceback.print_exc() result = Falseai-service/tests/test_watchman.py (2)
202-214: Unused unpacked variablereason.The
reasonvariable is unpacked but never used. Use underscore prefix to indicate intentional discard:- is_active, reason = await check_activity( + is_active, _ = await check_activity(
361-389: Unused fixturesample_context.The
sample_contextfixture is declared but not used since the test creates its own context. Remove it from the function signature:`@pytest.mark.asyncio` - async def test_execute_shutdown_live_mode(self, sample_context, aws_client): + async def test_execute_shutdown_live_mode(self, aws_client):ai-service/tests/test_evaluation.py (1)
6-7: Hardcoded path may be fragile.Same as noted for
test_multi_agent.py- consider using dynamic path resolution:+from pathlib import Path -sys.path.insert(0, 'src') +_src_dir = Path(__file__).resolve().parents[1] / "src" +sys.path.insert(0, str(_src_dir))ai-service/tests/test_e2e_api.py (1)
215-308: Minor duplication with test_hunter.py.These tests overlap with the dedicated
test_hunter.pymodule. While some redundancy in E2E tests is acceptable for sanity checks, consider whether this duplication is intentional or if the tests should be consolidated.ai-service/src/ai_service/agents/guard/state.py (1)
66-116: Replace deprecateddatetime.utcnow()with timezone-aware alternative.
datetime.utcnow()is deprecated since Python 3.12 because it returns a naive datetime. Usedatetime.now(timezone.utc)for timezone-aware handling:Diff
+from datetime import datetime, timezone - "scan_timestamp": datetime.utcnow().isoformat(), + "scan_timestamp": datetime.now(timezone.utc).isoformat(),This pattern appears across multiple state initializers (Hunter, Watchman, Guard) and integrations throughout the codebase and should be updated consistently.
| ## Recording Feedback | ||
|
|
||
| ### Submit Decision Feedback | ||
| ```python | ||
| from ai_service.evaluation.metrics import DecisionStore | ||
|
|
||
| store = DecisionStore() | ||
| await store.record_feedback( | ||
| decision_id="dec_123", | ||
| feedback="correct", # or "incorrect" | ||
| reasoning="Agent made the right call" | ||
| ) | ||
| ``` | ||
|
|
||
| ### Calculate Accuracy | ||
| ```python | ||
| from ai_service.evaluation.metrics import AccuracyCalculator | ||
|
|
||
| accuracy = await AccuracyCalculator.calculate( | ||
| agent="sentinel", | ||
| time_window_days=30 | ||
| ) | ||
| print(f"Accuracy: {accuracy.percentage:.1%}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check if DecisionStore and AccuracyCalculator exist in the codebase
echo "Searching for DecisionStore..."
rg -n "class DecisionStore" --type=py
echo ""
echo "Searching for AccuracyCalculator..."
rg -n "class AccuracyCalculator" --type=py
echo ""
echo "Searching for record_feedback method..."
rg -n "def record_feedback" --type=py
echo ""
echo "Checking if evaluation/metrics.py exists..."
fd -t f "metrics.py" | grep evaluationRepository: Aparnap2/ExecOps
Length of output: 702
🏁 Script executed:
#!/bin/bash
# Get the DecisionStore class and its record_feedback method
echo "=== DecisionStore class and record_feedback method ==="
sed -n '122,220p' ai-service/src/ai_service/evaluation/metrics.py
echo ""
echo "=== AccuracyCalculator class and calculate method ==="
sed -n '292,350p' ai-service/src/ai_service/evaluation/metrics.pyRepository: Aparnap2/ExecOps
Length of output: 5537
🏁 Script executed:
#!/bin/bash
# Search for calculate method in AccuracyCalculator
echo "=== Searching for calculate method (not calculate_from_records) ==="
sed -n '292,400p' ai-service/src/ai_service/evaluation/metrics.py | head -100Repository: Aparnap2/ExecOps
Length of output: 3745
🏁 Script executed:
#!/bin/bash
# Search for all methods in AccuracyCalculator
echo "=== All AccuracyCalculator methods ==="
rg -A 5 "class AccuracyCalculator" --type=py
echo ""
echo "=== Search for async calculate method ==="
rg "async.*def calculate" --type=py
echo ""
echo "=== Search for def calculate in AccuracyCalculator context ==="
awk '/class AccuracyCalculator/,/^class [A-Z]/ {print NR": "$0}' ai-service/src/ai_service/evaluation/metrics.py | head -150Repository: Aparnap2/ExecOps
Length of output: 723
🏁 Script executed:
#!/bin/bash
# Get the full AccuracyCalculator class
echo "=== Full AccuracyCalculator class ==="
awk '/^class AccuracyCalculator:/,/^class / {print NR": "$0}' ai-service/src/ai_service/evaluation/metrics.py | grep -v "^$" | head -80Repository: Aparnap2/ExecOps
Length of output: 128
🏁 Script executed:
#!/bin/bash
# Get line count of metrics.py
wc -l ai-service/src/ai_service/evaluation/metrics.py
# Get AccuracyCalculator class starting at line 292
echo ""
echo "=== AccuracyCalculator class (lines 292-370) ==="
sed -n '292,370p' ai-service/src/ai_service/evaluation/metrics.pyRepository: Aparnap2/ExecOps
Length of output: 3024
🏁 Script executed:
#!/bin/bash
# Search for all calculate methods
echo "=== All 'calculate' methods in metrics.py ==="
rg "def calculate|async def calculate" ai-service/src/ai_service/evaluation/metrics.py -n
echo ""
echo "=== DecisionStore.get_accuracy method ==="
sed -n '200,230p' ai-service/src/ai_service/evaluation/metrics.pyRepository: Aparnap2/ExecOps
Length of output: 1435
Fix API examples in documentation - method signatures and parameters do not match implementation.
The DecisionStore.record_feedback() call is missing the required agent parameter:
# Current (incorrect) example:
await store.record_feedback(
decision_id="dec_123",
feedback="correct",
reasoning="Agent made the right call"
)
# Should include agent parameter:
await store.record_feedback(
decision_id="dec_123",
agent="sentinel", # required parameter
feedback="correct",
reasoning="Agent made the right call"
)Additionally, AccuracyCalculator.calculate() does not exist. The actual API is AccuracyCalculator.calculate_from_records(records), which expects a list of DecisionRecord objects, not agent and time_window parameters. Consider using DecisionStore.get_accuracy(agent, days) instead for the documented use case.
🤖 Prompt for AI Agents
In `@ai-service/docs/GUIDES.md` around lines 143 - 165, Update the documentation
examples to match the actual API: include the required agent parameter when
calling DecisionStore.record_feedback (i.e., call record_feedback with
decision_id, agent, feedback, reasoning) and replace the non-existent
AccuracyCalculator.calculate(...) example with the real API—either show using
AccuracyCalculator.calculate_from_records(records) which accepts a list of
DecisionRecord objects, or use DecisionStore.get_accuracy(agent, days) for the
documented agent/time-window use case; reference DecisionStore.record_feedback,
AccuracyCalculator.calculate_from_records, and DecisionStore.get_accuracy when
making the changes.
| redis: | ||
| @echo "Starting Redis container..." | ||
| @docker run -d \ | ||
| --name $(REDIS_CONTAINER) \ | ||
| --net $(NETWORK_NAME) \ | ||
| -p $(REDIS_PORT9 \ | ||
| ):637redis:alpine \ | ||
| 2>/dev/null || echo "Redis may already be running" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find . -name "Makefile.opsmate" -type fRepository: Aparnap2/ExecOps
Length of output: 89
🏁 Script executed:
cat -n ./ai-service/Makefile.opsmate | sed -n '60,80p'Repository: Aparnap2/ExecOps
Length of output: 935
🏁 Script executed:
grep -n "REDIS_PORT\|REDIS_CONTAINER\|NETWORK_NAME" ./ai-service/Makefile.opsmate | head -20Repository: Aparnap2/ExecOps
Length of output: 1200
Fix malformed Redis docker run command
The port mapping and image reference are corrupted (lines 72–73). The $(REDIS_PORT9 variable reference is invalid, and the image name is incorrectly concatenated with part of a port number. This will break make redis.
🐛 Proposed fix
`@docker` run -d \
--name $(REDIS_CONTAINER) \
--net $(NETWORK_NAME) \
- -p $(REDIS_PORT9 \
- ):637redis:alpine \
+ -p $(REDIS_PORT):6379 \
+ redis:alpine \
2>/dev/null || echo "Redis may already be running"📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| redis: | |
| @echo "Starting Redis container..." | |
| @docker run -d \ | |
| --name $(REDIS_CONTAINER) \ | |
| --net $(NETWORK_NAME) \ | |
| -p $(REDIS_PORT9 \ | |
| ):637redis:alpine \ | |
| 2>/dev/null || echo "Redis may already be running" | |
| redis: | |
| `@echo` "Starting Redis container..." | |
| `@docker` run -d \ | |
| --name $(REDIS_CONTAINER) \ | |
| --net $(NETWORK_NAME) \ | |
| -p $(REDIS_PORT):6379 \ | |
| redis:alpine \ | |
| 2>/dev/null || echo "Redis may already be running" |
🤖 Prompt for AI Agents
In `@ai-service/Makefile.opsmate` around lines 67 - 74, The redis Makefile target
has a malformed docker run command: the port variable is mistyped as
$(REDIS_PORT9 and the image/tokenization is corrupted ("-p $(REDIS_PORT9
\n\t):637redis:alpine"). Fix the redis target (function/target name: redis) to
use the correct port variable $(REDIS_PORT), ensure the -p mapping is "-p
$(REDIS_PORT):6379", and restore the image reference to "redis:alpine"; keep the
existing references to $(REDIS_CONTAINER) and $(NETWORK_NAME) and preserve the
2>/dev/null || echo fallback behavior.
| def _parse_decision(self, result: Dict, decision_id: str, event_type: str, event_data: Dict) -> AgentDecision: | ||
| content = result.get("content", "") | ||
|
|
||
| try: | ||
| start = content.find("{") | ||
| end = content.rfind("}") + 1 | ||
| if start >= 0 and end > start: | ||
| parsed = json.loads(content[start:end]) | ||
| return AgentDecision( | ||
| decision_id=decision_id, | ||
| action=ActionType(parsed.get("action", "approve").lower()), | ||
| reasoning=parsed.get("reasoning", content[:200]), | ||
| confidence=parsed.get("confidence", 0.7), | ||
| context_summary=f"Event: {event_type}", | ||
| suggestions=parsed.get("suggestions", []), | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Guard against invalid LLM action values to avoid ValueError.
If the model returns an unexpected action, ActionType(...) raises ValueError and the whole event processing fails. Default to APPROVE (or a safe fallback) when parsing fails.
🐛 Proposed fix
- return AgentDecision(
- decision_id=decision_id,
- action=ActionType(parsed.get("action", "approve").lower()),
+ action_value = str(parsed.get("action", "approve")).lower()
+ try:
+ action = ActionType(action_value)
+ except ValueError:
+ action = ActionType.APPROVE
+ return AgentDecision(
+ decision_id=decision_id,
+ action=action,
reasoning=parsed.get("reasoning", content[:200]),
confidence=parsed.get("confidence", 0.7),
context_summary=f"Event: {event_type}",
suggestions=parsed.get("suggestions", []),
)🧰 Tools
🪛 Ruff (0.14.14)
124-124: Unused method argument: event_data
(ARG002)
🤖 Prompt for AI Agents
In `@ai-service/src/ai_service/agents/execops_agent.py` around lines 124 - 139,
The _parse_decision function can raise ValueError when
ActionType(parsed.get("action", "approve").lower()) gets an unexpected value;
update _parse_decision to validate or catch errors when converting to
ActionType: extract the raw_action = parsed.get("action", "approve"), normalize
it (e.g., raw_action.lower()), then attempt to map/convert to ActionType inside
a try/except (catch ValueError or KeyError) and on failure set action =
ActionType.APPROVE (safe fallback); finally construct and return the
AgentDecision with that safe action while keeping the existing reasoning,
confidence, context_summary, and suggestions logic.
| await hunter.learn_from_feedback( | ||
| decision_id="test-123", | ||
| feedback="approved", | ||
| suggestion="Good find, proceed with cleanup", | ||
| ) | ||
|
|
||
| patterns = hunter.feedback_patterns | ||
| print(f"Feedback patterns stored: {list(patterns.keys())}") | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Test calls non-existent learn_from_feedback method on HunterAgent.
test_collective_memory() calls hunter.learn_from_feedback() and accesses hunter.feedback_patterns, but the Agent base class and HunterAgent don't define these. This test will fail with AttributeError.
Options to fix
Either add the missing method to the Agent base class:
class Agent:
def __init__(self, role: AgentRole, llm_client: OpenRouterClient = None):
# ... existing init ...
self.feedback_patterns: Dict[str, List] = {}
async def learn_from_feedback(self, decision_id: str, feedback: str, suggestion: str = None):
"""Learn from human feedback."""
# Implementation here
passOr remove/modify the test to not rely on this method:
- # Test feedback learning
- await hunter.learn_from_feedback(
- decision_id="test-123",
- feedback="approved",
- suggestion="Good find, proceed with cleanup",
- )
-
- patterns = hunter.feedback_patterns
- print(f"Feedback patterns stored: {list(patterns.keys())}")
+ # Note: feedback learning is handled by ExecOpsAgent, not base Agent🤖 Prompt for AI Agents
In `@ai-service/src/ai_service/agents/multi_agent.py` around lines 592 - 600, The
test calls a non-existent learn_from_feedback API and feedback_patterns on
HunterAgent, causing AttributeError; add these to the Agent/HunterAgent
implementation by introducing a feedback_patterns attribute (e.g.,
self.feedback_patterns: Dict[str, List]) on Agent and implement an async
learn_from_feedback(self, decision_id: str, feedback: str, suggestion: str |
None) method that updates feedback_patterns (keyed by decision_id) and persists
or processes feedback as HunterAgent expects; alternatively, adjust
test_collective_memory to stop calling learn_from_feedback and not inspect
feedback_patterns if you prefer not to change Agent.
| async def get_accuracy(self, agent: str, days: int = 30) -> float: | ||
| """Calculate accuracy for an agent over time.""" | ||
| client = await self._get_client() | ||
|
|
||
| try: | ||
| pattern = f"{self._prefix}{agent}:*" | ||
| keys = await client.keys(pattern) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aggregate accuracy for “all agents” currently returns 0.
generate_report() passes "all" when agent is None, but decision keys are stored by actual agent names; "all" won’t match any keys. Allow agent=None in get_accuracy to scan all keys, and pass None from generate_report.
🐛 Proposed fix (get_accuracy)
- async def get_accuracy(self, agent: str, days: int = 30) -> float:
+ async def get_accuracy(self, agent: str = None, days: int = 30) -> float:
"""Calculate accuracy for an agent over time."""
client = await self._get_client()
try:
- pattern = f"{self._prefix}{agent}:*"
+ pattern = f"{self._prefix}{agent}:*" if agent else f"{self._prefix}*"
keys = await client.keys(pattern)🐛 Proposed fix (generate_report)
- accuracy = await self._store.get_accuracy(agent or "all")
+ accuracy = await self._store.get_accuracy(agent)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| async def get_accuracy(self, agent: str, days: int = 30) -> float: | |
| """Calculate accuracy for an agent over time.""" | |
| client = await self._get_client() | |
| try: | |
| pattern = f"{self._prefix}{agent}:*" | |
| keys = await client.keys(pattern) | |
| async def get_accuracy(self, agent: str = None, days: int = 30) -> float: | |
| """Calculate accuracy for an agent over time.""" | |
| client = await self._get_client() | |
| try: | |
| pattern = f"{self._prefix}{agent}:*" if agent else f"{self._prefix}*" | |
| keys = await client.keys(pattern) |
🤖 Prompt for AI Agents
In `@ai-service/src/ai_service/evaluation/metrics.py` around lines 211 - 218,
get_accuracy currently expects a concrete agent name and builds a key pattern
using f"{self._prefix}{agent}:*", so when generate_report passes the string
"all" no real agent keys match and accuracy is 0; change get_accuracy signature
to accept agent: Optional[str] = None and when agent is None build the pattern
as f"{self._prefix}*"(scan all agents) otherwise f"{self._prefix}{agent}:*";
also update generate_report to pass None instead of the literal "all" when it
intends to aggregate across all agents so the new pattern will match all
decision keys; keep references to get_accuracy and generate_report when making
these changes.
| def remember( | ||
| self, | ||
| key: str, | ||
| value: Any, | ||
| ttl_seconds: Optional[int] = None, | ||
| ) -> None: | ||
| """Remember something (async wrapper). | ||
| Args: | ||
| key: Memory key | ||
| value: Value to remember | ||
| ttl_seconds: Optional TTL | ||
| """ | ||
| import asyncio | ||
|
|
||
| asyncio.create_task(self._save_memory(key, value, ttl_seconds)) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
cd ai-service && find . -name "redis_store.py" -type fRepository: Aparnap2/ExecOps
Length of output: 98
🏁 Script executed:
cat -n src/ai_service/memory/redis_store.py | head -400Repository: Aparnap2/ExecOps
Length of output: 128
🏁 Script executed:
cat -n src/ai_service/memory/redis_store.py | sed -n '340,400p'Repository: Aparnap2/ExecOps
Length of output: 128
🏁 Script executed:
cat -n ai-service/src/ai_service/memory/redis_store.py | sed -n '340,400p'Repository: Aparnap2/ExecOps
Length of output: 2087
🏁 Script executed:
cat -n ai-service/src/ai_service/memory/redis_store.py | sed -n '1,50p'Repository: Aparnap2/ExecOps
Length of output: 1605
🏁 Script executed:
cat -n ai-service/src/ai_service/memory/redis_store.py | sed -n '395,420p'Repository: Aparnap2/ExecOps
Length of output: 883
🏁 Script executed:
rg -n "\.remember\(" ai-service/src --type py -B 2 -A 2Repository: Aparnap2/ExecOps
Length of output: 1169
🏁 Script executed:
rg -n "\.forget\(" ai-service/src --type py -B 2 -A 2Repository: Aparnap2/ExecOps
Length of output: 42
🏁 Script executed:
rg -n "def forget_async" ai-service/src --type py -A 5Repository: Aparnap2/ExecOps
Length of output: 513
🏁 Script executed:
rg -n "_save_memory\|_delete_memory" ai-service/src --type py -B 1 -A 3Repository: Aparnap2/ExecOps
Length of output: 42
🏁 Script executed:
python3 << 'EOF'
import asyncio
# Test 1: What happens with create_task() without running loop
try:
asyncio.create_task(asyncio.sleep(0))
except RuntimeError as e:
print(f"Test 1 - create_task() without loop: {e}")
# Test 2: What asyncio.run() does
async def dummy_coro():
pass
# asyncio.run() will work (it creates a new loop)
try:
asyncio.run(dummy_coro())
print("Test 2 - asyncio.run() works fine")
except Exception as e:
print(f"Test 2 - asyncio.run() error: {e}")
# Test 3: What get_running_loop() does without a loop
try:
asyncio.get_running_loop()
except RuntimeError as e:
print(f"Test 3 - get_running_loop() without loop: {e}")
EOFRepository: Aparnap2/ExecOps
Length of output: 146
🏁 Script executed:
head -30 ai-service/setup.py ai-service/pyproject.toml 2>/dev/null | grep -i pythonRepository: Aparnap2/ExecOps
Length of output: 119
🏁 Script executed:
rg -n "get_event_loop\|get_running_loop\|create_task" ai-service/src --type pyRepository: Aparnap2/ExecOps
Length of output: 42
🏁 Script executed:
cat -n ai-service/src/ai_service/memory/redis_store.py | sed -n '349,365p'Repository: Aparnap2/ExecOps
Length of output: 580
🏁 Script executed:
cat -n ai-service/src/ai_service/memory/redis_store.py | sed -n '366,394p'Repository: Aparnap2/ExecOps
Length of output: 968
🏁 Script executed:
python3 << 'EOF'
import asyncio
# Demonstrate the issue: create_task without running loop
async def sample():
await asyncio.sleep(0)
try:
# This is what happens in remember() and forget()
task = asyncio.create_task(sample())
print("create_task succeeded")
except RuntimeError as e:
print(f"create_task error: {type(e).__name__}: {e}")
# Now show the recall() pattern works
try:
loop = asyncio.get_event_loop()
loop.run_until_complete(sample())
print("get_event_loop + run_until_complete works")
except Exception as e:
print(f"Error: {e}")
# Show the suggested pattern works
try:
try:
loop = asyncio.get_running_loop()
except RuntimeError:
asyncio.run(sample())
print("get_running_loop + asyncio.run pattern works")
except Exception as e:
print(f"Error: {e}")
EOFRepository: Aparnap2/ExecOps
Length of output: 357
Fix event loop handling in sync wrappers
asyncio.create_task() raises RuntimeError when called from sync contexts without a running event loop, breaking the remember() and forget() sync wrappers when invoked from non-async code.
The recall() method already handles this correctly with get_event_loop() + run_until_complete(). Apply the same pattern here, or use the suggested approach with get_running_loop() + asyncio.run().
🔧 Suggested safe wrapper
def remember(
self,
key: str,
value: Any,
ttl_seconds: Optional[int] = None,
) -> None:
import asyncio
-
- asyncio.create_task(self._save_memory(key, value, ttl_seconds))
+ try:
+ loop = asyncio.get_running_loop()
+ except RuntimeError:
+ asyncio.run(self._save_memory(key, value, ttl_seconds))
+ else:
+ loop.create_task(self._save_memory(key, value, ttl_seconds))Apply the same fix to forget() at lines 385-394.
🧰 Tools
🪛 Ruff (0.14.14)
364-364: Store a reference to the return value of asyncio.create_task
(RUF006)
🤖 Prompt for AI Agents
In `@ai-service/src/ai_service/memory/redis_store.py` around lines 349 - 365, The
sync wrapper remember() uses asyncio.create_task() which raises RuntimeError
when no running loop; update remember() to follow the same safe pattern used in
recall(): attempt to get the current event loop and call
loop.run_until_complete(self._save_memory(key, value, ttl_seconds)) when a loop
is available, otherwise call asyncio.run(self._save_memory(...)); apply the same
change to forget() to call self._delete_memory(...) via run_until_complete or
asyncio.run as appropriate; reference functions: remember(), forget(),
_save_memory(), _delete_memory(), and recall() to locate and mirror the safe
event-loop handling.
| for test in test_events: | ||
| json_body = json.dumps({ | ||
| "event_type": test["event_type"], | ||
| "event_data": test["data"], | ||
| }, indent=6) | ||
| print(f"\n# {test['name']}") | ||
| print(f"""curl -X POST http://localhost:8000/api/v1/agent/process \\ | ||
| -H "Content-Type: application/json" \\ | ||
| -d '{json_body[1:]}'""") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Incorrect JSON slicing produces malformed curl command.
json_body[1:] removes the first character (the opening {) from the JSON string, resulting in invalid JSON in the curl command output.
🐛 Proposed fix
for test in test_events:
json_body = json.dumps({
"event_type": test["event_type"],
"event_data": test["data"],
- }, indent=6)
+ }, indent=2)
print(f"\n# {test['name']}")
print(f"""curl -X POST http://localhost:8000/api/v1/agent/process \\
-H "Content-Type: application/json" \\
- -d '{json_body[1:]}'""")
+ -d '{json_body}'""")📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| for test in test_events: | |
| json_body = json.dumps({ | |
| "event_type": test["event_type"], | |
| "event_data": test["data"], | |
| }, indent=6) | |
| print(f"\n# {test['name']}") | |
| print(f"""curl -X POST http://localhost:8000/api/v1/agent/process \\ | |
| -H "Content-Type: application/json" \\ | |
| -d '{json_body[1:]}'""") | |
| for test in test_events: | |
| json_body = json.dumps({ | |
| "event_type": test["event_type"], | |
| "event_data": test["data"], | |
| }, indent=2) | |
| print(f"\n# {test['name']}") | |
| print(f"""curl -X POST http://localhost:8000/api/v1/agent/process \\ | |
| -H "Content-Type: application/json" \\ | |
| -d '{json_body}'""") |
🤖 Prompt for AI Agents
In `@ai-service/tests/test_agent_api.py` around lines 92 - 100, The curl command
construction incorrectly slices the JSON string (json_body[1:]) which removes
the opening '{' and yields malformed JSON; update the print that builds the curl
-d payload to use the full json_body variable (or otherwise preserve the
complete JSON) in the block where json_body is defined and used in the print
statements so the -d argument contains valid JSON.
| json_body = json.dumps({ | ||
| "event_type": test["event_type"], | ||
| "event_data": test["data"], | ||
| }, indent=6) | ||
| print(f"\n# {test['name']}") | ||
| print(f"""curl -X POST http://localhost:8000/api/v1/agent/process \\ | ||
| -H "Content-Type: application/json" \\ | ||
| -d '{json_body[1:]}'""") | ||
|
|
||
| if __name__ == "__main__": | ||
| import json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
json module used before it's imported.
The json module is used at line 93 inside main(), but imported at line 103 inside the if __name__ == "__main__": block. This will cause a NameError when main() executes.
🐛 Proposed fix: Move import to top of file
import sys
sys.path.insert(0, 'src')
import asyncio
+import json
from ai_service.agents.execops_agent import ExecOpsAgentAnd remove the import from the bottom:
if __name__ == "__main__":
- import json
asyncio.run(main())🤖 Prompt for AI Agents
In `@ai-service/tests/test_agent_api.py` around lines 93 - 103, The json module is
imported inside the if __name__ == "__main__" block but used earlier in main()
when building json_body, causing a NameError; move the import json to the
top-level imports (so main() and any top-level code can use it), and remove the
trailing import json from the bottom of the file (keep the existing main(),
json_body construction, and the if __name__ block unchanged aside from removing
that bottom import).
| def test_github_webhook_validates_signature(self, client, mock_github_payload): | ||
| """Test that GitHub webhook signature is validated if configured.""" | ||
| # Without signature header, should still work (signature optional) | ||
| response = client.post( | ||
| "/api/v1/webhook/github", | ||
| json=mock_github_payload, | ||
| headers={ | ||
| "X-GitHub-Event": "pull_request", | ||
| "X-GitHub-Delivery": "test-delivery-125", | ||
| }, | ||
| ) | ||
|
|
||
| # Should accept request even without signature (depends on config) | ||
| assert response.status_code in [200, 400, 401] | ||
|
|
||
| def test_github_webhook_missing_pr_data(self, client): | ||
| """Test handling of malformed webhook payload.""" | ||
| invalid_payload = { | ||
| "action": "opened", | ||
| # Missing pull_request field | ||
| } | ||
|
|
||
| response = client.post( | ||
| "/api/v1/webhook/github", | ||
| json=invalid_payload, | ||
| headers={ | ||
| "Content-Type": "application/json", | ||
| "X-GitHub-Event": "pull_request", | ||
| "X-GitHub-Delivery": "test-delivery-126", | ||
| }, | ||
| ) | ||
|
|
||
| # Webhook logs warning but still returns 200 with status | ||
| # The webhook processes the event but may have partial data | ||
| assert response.status_code == 200 | ||
| data = response.json() | ||
| assert data["status"] == "processed" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
find ai-service -type f -name "*.py" | xargs grep -l "webhook" | head -20Repository: Aparnap2/ExecOps
Length of output: 981
🏁 Script executed:
cd ai-service && find . -type f -name "*.py" -path "*/webhooks/*"Repository: Aparnap2/ExecOps
Length of output: 95
🏁 Script executed:
rg -nA 20 "def.*github.*webhook|async def.*github.*webhook" ai-service/src/Repository: Aparnap2/ExecOps
Length of output: 3599
🏁 Script executed:
cat -n ai-service/src/ai_service/webhooks/github.py | head -150Repository: Aparnap2/ExecOps
Length of output: 5691
🏁 Script executed:
cat -n ai-service/src/ai_service/integrations/webhook.py | head -150Repository: Aparnap2/ExecOps
Length of output: 5696
🏁 Script executed:
cat -n ai-service/src/ai_service/webhooks/github.py | sed -n '120,180p'Repository: Aparnap2/ExecOps
Length of output: 2513
🏁 Script executed:
cat -n ai-service/src/ai_service/webhooks/github.py | tail -30Repository: Aparnap2/ExecOps
Length of output: 1144
🏁 Script executed:
cat -n ai-service/tests/test_e2e_api.py | sed -n '172,210p'Repository: Aparnap2/ExecOps
Length of output: 1758
🏁 Script executed:
cat -n ai-service/tests/test_e2e_api.py | sed -n '1,60p'Repository: Aparnap2/ExecOps
Length of output: 2212
🏁 Script executed:
rg -n "def.*client|@pytest.fixture.*client" ai-service/tests/test_e2e_api.py | head -10Repository: Aparnap2/ExecOps
Length of output: 709
🏁 Script executed:
rg -n "ainvoke_sentinel|def ainvoke_sentinel" ai-service/src/ai_service/agents/sentinel/Repository: Aparnap2/ExecOps
Length of output: 221
🏁 Script executed:
grep -n "GITHUB_WEBHOOK_SECRET" ai-service/src/ai_service/webhooks/github.pyRepository: Aparnap2/ExecOps
Length of output: 213
🏁 Script executed:
cat -n ai-service/src/ai_service/agents/sentinel/graph.py | sed -n '108,130p'Repository: Aparnap2/ExecOps
Length of output: 983
🏁 Script executed:
grep -A 30 "def create_initial_state" ai-service/src/ai_service/agents/sentinel/ -rRepository: Aparnap2/ExecOps
Length of output: 42
🏁 Script executed:
cat -n ai-service/src/ai_service/main.py | grep -A 5 "GITHUB_WEBHOOK_SECRET\|app ="Repository: Aparnap2/ExecOps
Length of output: 272
🏁 Script executed:
rg -n "test.*signature|mock.*secret" ai-service/tests/test_e2e_api.pyRepository: Aparnap2/ExecOps
Length of output: 147
Line 185: Assertion is too broad—should expect 200 only in test environment.
When no GITHUB_WEBHOOK_SECRET is configured (typical for tests), verify_signature() returns True (line 38-43), so signature validation succeeds and status is 200. The assertion accepting [200, 400, 401] masks the actual behavior. Change to assert response.status_code == 200.
Line 206-208: Assertion is correct, but handler lacks validation for required fields.
The handler does return 200 for a payload with missing pull_request field. However, the handler extracts PR data with .get() defaults (lines 121-127), allowing pr_number to be None, which gets passed to ainvoke_sentinel() (line 173). The test comment claims "webhook logs warning" but the code logs with pr_number=None (line 129) without explicit validation or warning. Consider adding explicit validation for required fields (pull_request, number) to return 400 instead of silently processing incomplete data.
🤖 Prompt for AI Agents
In `@ai-service/tests/test_e2e_api.py` around lines 172 - 208, The first test
(test_github_webhook_validates_signature) uses a too-broad assertion; change it
to assert response.status_code == 200 to reflect test env behavior when
GITHUB_WEBHOOK_SECRET is not set and verify_signature returns True. For the
webhook handler, add explicit validation before calling ainvoke_sentinel: ensure
payload contains "pull_request" and that pull_request.get("number") is present;
if missing, log a clear warning and return a 400 response instead of proceeding
with pr_number=None so ainvoke_sentinel is never called with incomplete data.
Apply these changes by updating the test function and the GitHub webhook handler
(the code that reads pull_request/pr_number and calls ainvoke_sentinel and
verify_signature).
| def test_create_with_postgres_checkpointer(self): | ||
| """Test creating graph with postgres checkpointer (mocked).""" | ||
| from ai_service.graphs.vertical_agents import create_vertical_agent_graph | ||
|
|
||
| with patch('ai_service.graphs.vertical_agents.get_vertical_graph') as mock_get: | ||
| mock_graph = MagicMock() | ||
| mock_get.return_value = mock_graph | ||
|
|
||
| with patch('langgraph.checkpoint.memory.MemorySaver') as mock_memory: | ||
| mock_checkpointer = MagicMock() | ||
| mock_memory.return_value = mock_checkpointer | ||
| mock_graph.compile = MagicMock(return_value=MagicMock()) | ||
|
|
||
| graph = create_vertical_agent_graph("team_pulse", use_postgres=True) | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# First, find the relevant files in the codebase
find . -type f -name "*.py" | grep -E "(vertical_agents|checkpointer)" | head -20Repository: Aparnap2/ExecOps
Length of output: 270
🏁 Script executed:
# Look for the create_vertical_agent_graph function
rg "def create_vertical_agent_graph" -A 5Repository: Aparnap2/ExecOps
Length of output: 988
🏁 Script executed:
# Check where USE_POSTGRES_CHECKPOINTER is defined and its default value
rg "USE_POSTGRES_CHECKPOINTER" -B 2 -A 2Repository: Aparnap2/ExecOps
Length of output: 2304
🏁 Script executed:
# Find the get_sync_checkpointer function
rg "def get_sync_checkpointer" -A 10Repository: Aparnap2/ExecOps
Length of output: 1018
🏁 Script executed:
# Check the test file to understand the current state
sed -n '298,312p' ai-service/tests/test_vertical_agents.pyRepository: Aparnap2/ExecOps
Length of output: 791
🏁 Script executed:
# Get the full implementation of create_vertical_agent_graph
sed -n '1,100p' ai-service/src/ai_service/graphs/vertical_agents.py | cat -nRepository: Aparnap2/ExecOps
Length of output: 3598
🏁 Script executed:
# See more of the implementation
rg "def create_vertical_agent_graph" -A 30 ai-service/src/ai_service/graphs/vertical_agents.pyRepository: Aparnap2/ExecOps
Length of output: 1302
🏁 Script executed:
# Check the checkpointer implementation to see what actually happens
cat -n ai-service/src/ai_service/infrastructure/checkpointer.py | head -50Repository: Aparnap2/ExecOps
Length of output: 1800
🏁 Script executed:
# Get the rest of the create_vertical_agent_graph implementation
rg "def create_vertical_agent_graph" -A 50 ai-service/src/ai_service/graphs/vertical_agents.pyRepository: Aparnap2/ExecOps
Length of output: 1936
🏁 Script executed:
# Check the full get_sync_checkpointer implementation
rg "def get_sync_checkpointer" -A 20 ai-service/src/ai_service/infrastructure/checkpointer.pyRepository: Aparnap2/ExecOps
Length of output: 682
🏁 Script executed:
# Check if there are other tests in the file that handle the environment variable
rg "USE_POSTGRES_CHECKPOINTER" ai-service/tests/test_vertical_agents.py -B 3 -A 3Repository: Aparnap2/ExecOps
Length of output: 406
Mock the Postgres checkpointer to keep this unit test hermetic
With use_postgres=True and USE_POSTGRES_CHECKPOINTER defaulting to "true", this test will attempt to create a real Postgres checkpointer. The current mock of langgraph.checkpoint.memory.MemorySaver is not invoked on this code path. Mock ai_service.infrastructure.checkpointer.get_sync_checkpointer or set the environment variable to false to avoid the external database dependency.
🤖 Prompt for AI Agents
In `@ai-service/tests/test_vertical_agents.py` around lines 298 - 312, The test
calls create_vertical_agent_graph with use_postgres=True which triggers the real
Postgres checkpointer; update the test to mock
ai_service.infrastructure.checkpointer.get_sync_checkpointer (the function used
by create_vertical_agent_graph) so it returns a MagicMock checkpointer, or
alternatively set the environment flag USE_POSTGRES_CHECKPOINTER to "false"
before invoking create_vertical_agent_graph; ensure you still patch
get_vertical_graph and/or langgraph.checkpoint.memory.MemorySaver as needed and
return mock objects so the test remains hermetic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces several new files and modifies an existing one to enhance the AI service with new agents, improved infrastructure, and comprehensive testing. The changes include adding a Makefile for deployment testing, environment configuration, documentation, and evaluation metrics. The most significant change is the introduction of the ExecOpsAgent and multi-agent orchestration, along with fallback mechanisms for LLM calls and Redis-backed memory. The addition of new tests and documentation aims to ensure code quality and maintainability.
| -p $(REDIS_PORT9 \ | ||
| ):637redis:alpine \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
|
||
| # Check if within quiet hours | ||
| if quiet_hours_start <= 24: | ||
| if quiet_hours_start < 24: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| def normalize_email(email: str) -> str: | ||
| """Normalize email address for comparison. | ||
| Removes +suffixes commonly used for email filtering: | ||
| - "bob+work@gmail.com" -> "bob@gmail.com" | ||
| - "ALICE@EXAMPLE.COM" -> "alice@example.com" | ||
| Args: | ||
| email: Email address to normalize | ||
| Returns: | ||
| Normalized email address | ||
| """ | ||
| if not email or "@" not in email: | ||
| return email.lower() if email else email | ||
|
|
||
| local, domain = email.split("@", 1) | ||
|
|
||
| # Remove +suffix from local part | ||
| if "+" in local: | ||
| local = local.split("+")[0] | ||
|
|
||
| return f"{local.lower()}@{domain.lower()}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function normalizes email addresses, which is a good practice for comparing emails from different sources. However, it might be beneficial to add a comment explaining why this normalization is necessary, especially the removal of the +suffix part. This will help future developers understand the purpose of this function.
| return False, "Team offline but outside quiet hours" | ||
|
|
||
| # If no quiet hours configured, always shutdown when offline | ||
| # If no quiet hours configured (quiet_hours_start >= 24), always shutdown when offline |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary by CodeRabbit
New Features
Documentation
Chores
✏️ Tip: You can customize this high-level summary in your review settings.