-
Notifications
You must be signed in to change notification settings - Fork 12
Open
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed
Description
Summary
Implement baseline comparison that fails CI if metrics regress beyond a threshold.
Tasks
- Add
agentunit baseline saveto store current results as baseline. - Add
agentunit eval --fail-on-regressionto compare against baseline. - Support configurable thresholds per metric.
- Output diff report showing improvements/regressions.
Acceptance criteria
- Baseline file stored in
.agentunit/baseline.json. - CI fails if any metric drops more than threshold.
- Clear diff output showing metric changes.
Metadata
Metadata
Assignees
Labels
enhancementNew feature or requestNew feature or requesthelp wantedExtra attention is neededExtra attention is needed