Skip to content

Add baseline comparison with regression detection #20

@aviralgarg05

Description

@aviralgarg05

Summary

Implement baseline comparison that fails CI if metrics regress beyond a threshold.

Tasks

  • Add agentunit baseline save to store current results as baseline.
  • Add agentunit eval --fail-on-regression to compare against baseline.
  • Support configurable thresholds per metric.
  • Output diff report showing improvements/regressions.

Acceptance criteria

  • Baseline file stored in .agentunit/baseline.json.
  • CI fails if any metric drops more than threshold.
  • Clear diff output showing metric changes.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra attention is needed

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions