-
Notifications
You must be signed in to change notification settings - Fork 12
test(suite): add smoke test for docs/templates/suite_template.py #40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
test(suite): add smoke test for docs/templates/suite_template.py #40
Conversation
Mock a simple agent and verify the FAQAdapter template pattern records a response and passes a metric
Learn moreAll Green is an AI agent that automatically: ✅ Addresses code review comments ✅ Fixes failing CI checks ✅ Resolves merge conflicts |
WalkthroughAdds a new pytest smoke test that mocks an agent and an adapter, runs one DatasetCase through a Scenario via Runner, and asserts a single successful run for case id "faq_1" with answer_correctness == 1.0. Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Possibly related issues
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/test_suite_template_smoke.py (1)
34-63: Smoke test covers the core contract; consider a couple of small, optional assertionsThe test nicely:
- Builds a single
DatasetCase, runs it throughScenario/Runner, and asserts exactly one run was recorded forfaq_1.- Verifies the run is successful and that
answer_correctnessis1.0, which directly guards the FAQAdapter + metric wiring.- Also implicitly smoke-tests the import path for
AnswerCorrectnessMetric.Optional hardening (no need to block on this):
- If the run object exposes the adapter output (e.g.,
recorded_run.output), you could assert it equals the canned answer for a slightly stronger check.- If trace data is exposed on the run, adding a quick assertion that the expected trace events are present (agent response + knowledge base/tool event) would lock in the trace shape the template advertises.
As-is, the test already meets the stated smoke-test objective.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_suite_template_smoke.py(1 hunks)
🔇 Additional comments (2)
tests/test_suite_template_smoke.py (2)
5-11: MockAgent stub is clean and sufficient for the smoke testMinimal
connect()andanswer()behavior keeps the fixture easy to reason about and matches the FAQ-style use case. No changes needed here.
16-33: MockAdapter mirrors the adapter pattern well and captures useful tracesThe adapter:
- Uses a flexible
case.query/case.inputfallback, which is handy for keeping the test aligned if the template changes.- Records both the agent response and a tool/knowledge-base event into
trace.- Computes
successbased on the presence and exact match ofexpected_output, which is appropriate for this smoke test.No functional issues; this is a solid minimal adapter implementation for exercising the template.
|
Codecov Report✅ All modified and coverable lines are covered by tests. 📢 Thoughts on this report? Let us know! |
aviralgarg05
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix the failing workflow, then I will merge
…test Address ruff lint failures: remove unused imports and place module imports at top of file
aviralgarg05
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still failing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
tests/test_suite_template_smoke.py (1)
17-18: Optional: Explicitreturn Noneis redundant.Python functions return
Noneimplicitly, so the explicitreturn Nonestatements are unnecessary. However, they do make the intent clear and don't harm functionality.Apply this diff if you prefer more concise code:
- def prepare(self) -> None: # pragma: no cover - trivial - return None + def prepare(self) -> None: # pragma: no cover - trivial + pass- def cleanup(self) -> None: # pragma: no cover - trivial - return None + def cleanup(self) -> None: # pragma: no cover - trivial + passAlso applies to: 28-29
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
tests/test_suite_template_smoke.py(1 hunks)
🔇 Additional comments (6)
tests/test_suite_template_smoke.py (6)
1-3: LGTM! Imports are appropriate for the smoke test.The imports cover all necessary components for building and running the test scenario.
5-10: LGTM! MockAgent is appropriately simple for a smoke test.The mock agent correctly implements the expected interface pattern with
.connect()and.answer()methods.
20-26: LGTM! The execute logic is well-implemented.The method correctly:
- Connects to the agent and retrieves the answer
- Records trace events for observability
- Handles edge cases (None expected_output, whitespace differences)
- Returns a properly structured AdapterOutcome
The defensive
hasattrcheck on line 22 is sensible if the template supports bothqueryandinputattributes across different dataset schemas.
31-44: LGTM! Test setup is clean and well-structured.The test case construction is appropriate with matching expected output, and the scenario components are properly initialized following the agentunit pattern.
46-58: LGTM! Test execution and assertions are thorough.The test properly:
- Runs the scenario through the Runner
- Validates the result structure
- Locates the specific test case by ID
- Asserts the run succeeded
59-60: The test assertion is correct and will pass.The
answer_correctnessmetric is automatically computed by theRunnerwhen no explicit metrics configuration is provided. WhenRunnerConfigis created without ametricsparameter (as in this test),resolve_metrics(None)returns all metrics fromDEFAULT_METRICS, which includesAnswerCorrectnessMetric().In
AnswerCorrectnessMetric.evaluate(), the metric compares the expected output with the actual output. Since bothcase.expected_outputand theresp_textfromMockAdapterare identical strings ("This is a canned FAQ answer."), the exact match condition triggers and returns1.0. The metric is populated inrecorded_run.metricsby theRunner._run_scenario()method during execution, so the assertion will pass.Likely an incorrect or invalid review comment.
aviralgarg05
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still failing

Mock a simple agent and verify the FAQAdapter template pattern records a response and passes a metric
Description
Adds a minimal smoke test for the sample FAQAdapter pattern in suite_template.py. The test mocks an agent with .connect() and .answer(), inlines a small dataset via DatasetSource.from_list, implements a MockAdapter (test-only) that conforms to BaseAdapter and returns an AdapterOutcome, and asserts that the scenario run is recorded and the answer_correctness metric is 1.0 for an exact match.
Fixes #7
Type of Change
Please delete options that are not relevant:
Changes Made
Testing
Please describe the tests you ran to verify your changes:
Test Configuration
Test Results
Code Quality
pre-commit run --all-filesand addressed any issuesDocumentation
Breaking Changes
If this PR introduces breaking changes, please describe:
Dependencies
New dependencies:
package-name: Reason for addingPerformance Impact
Describe any performance implications:
Additional Context
Add any other context about the PR here:
Checklist
Reviewer Notes
Please pay special attention to:
Post-Merge Tasks
Tasks to complete after merging (if any):
Summary by CodeRabbit
✏️ Tip: You can customize this high-level summary in your review settings.