-
Notifications
You must be signed in to change notification settings - Fork 22
feat: add AI Config judge support #345
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: v7
Are you sure you want to change the base?
Conversation
Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cursor Bugbot has reviewed your changes and found 2 potential issues.
| extendedVariables["response_to_evaluate"] = "{{response_to_evaluate}}" | ||
|
|
||
| return c.Config(key, context, defaultValue, extendedVariables) | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
JudgeConfig double-tracks config and judge metric events
Medium Severity
JudgeConfig emits $ld:ai:judge:function:single at line 196, then delegates to c.Config which independently emits $ld:ai:config:function:single at line 73. Every judge evaluation is therefore double-counted — once as a judge function call and once as a regular config function call. This inflates the config function metric on the monitoring dashboard, making it appear there are more regular config evaluations than actually occurred.
Additional Locations (1)
| } | ||
| } | ||
|
|
||
| return "", fmt.Errorf("missing evaluationMetricKey") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused logger and configKey parameters in getMetricKey
Low Severity
getMetricKey accepts logger and configKey parameters but uses neither in the function body. These were likely intended for deprecation warnings (when falling back to the deprecated evaluationMetricKeys array) and for providing context in error messages, but appear to be remnants from a previous version where the logging loop was removed based on PR review feedback.


Requirements
Related issues
See https://docs.google.com/document/d/1lzYwQqCcTzN_2zkxJZDfJtgUcEJ4jbpx0KSsJ2bRENw/edit?tab=t.0#heading=h.5d8l30brvyuw for context
For other SDK implementations, see:
Describe the solution you've provided
Extending the Go SDK to support AI Config evaluations. This includes custom evaluator support as well.
This SDK was written with hopes to be congruent with the python and node implementations. Changes were verified by a local app that was created; the resultant data can be observed in the evaluator metrics for this AI config.
Describe alternatives you've considered
Provide a clear and concise description of any alternative solutions or features you've considered.
Additional context
Add any other context about the pull request here.
Note
Medium Risk
Adds new evaluation and metric-tracking paths (including dynamic metric keys and new event payload fields), which could affect analytics correctness and runtime behavior if misconfigured. Changes are well-covered by tests but touch core SDK tracking surfaces.
Overview
Adds judge-mode support to AI Configs by extending the config datamodel and builder with
mode,evaluationMetricKey/evaluationMetricKeys, andjudgeConfiguration(with defensive copying to keep configs immutable).Introduces
Client.JudgeConfigto fetch judge configs while preserving{{message_history}}/{{response_to_evaluate}}placeholders for a second Mustache interpolation pass during evaluation, and adds a newldai/judgepackage that samples, interpolates, invokes a structured provider, and parses judge responses.Extends
TrackerwithTrackJudgeResponseto emit evaluation scores as metrics (including optionaljudgeConfigKeyin event data), and adds comprehensive tests covering parsing, placeholder preservation, schema generation, sampling, and response validation.Written by Cursor Bugbot for commit 41141b9. This will update automatically on new commits. Configure here.