feat: add AI Config judge support #345

knfreemLD · 2026-02-05T18:04:44Z

Requirements

I have added test coverage for new or changed functionality
I have followed the repository's pull request submission guidelines
I have validated my changes against all supported platform versions

Related issues

See https://docs.google.com/document/d/1lzYwQqCcTzN_2zkxJZDfJtgUcEJ4jbpx0KSsJ2bRENw/edit?tab=t.0#heading=h.5d8l30brvyuw for context

For other SDK implementations, see:

feat: Added custom judge support for ai configs js-core#1073
feat: add support for custom judges via evaluation metric key python-server-sdk-ai#86 & feat: Add Chat and Judge supporting methods python-server-sdk-ai#64

Describe the solution you've provided

Extending the Go SDK to support AI Config evaluations. This includes custom evaluator support as well.

This SDK was written with hopes to be congruent with the python and node implementations. Changes were verified by a local app that was created; the resultant data can be observed in the evaluator metrics for this AI config.

Describe alternatives you've considered

Provide a clear and concise description of any alternative solutions or features you've considered.

Additional context

Add any other context about the pull request here.

Note

Medium Risk
Adds new evaluation and metric-tracking paths (including dynamic metric keys and new event payload fields), which could affect analytics correctness and runtime behavior if misconfigured. Changes are well-covered by tests but touch core SDK tracking surfaces.

Overview
Adds judge-mode support to AI Configs by extending the config datamodel and builder with mode, evaluationMetricKey/evaluationMetricKeys, and judgeConfiguration (with defensive copying to keep configs immutable).

Introduces Client.JudgeConfig to fetch judge configs while preserving {{message_history}} / {{response_to_evaluate}} placeholders for a second Mustache interpolation pass during evaluation, and adds a new ldai/judge package that samples, interpolates, invokes a structured provider, and parses judge responses.

Extends Tracker with TrackJudgeResponse to emit evaluation scores as metrics (including optional judgeConfigKey in event data), and adds comprehensive tests covering parsing, placeholder preservation, schema generation, sampling, and response validation.

^{Written by Cursor Bugbot for commit 41141b9. This will update automatically on new commits. Configure here.}

ldai/tracker.go

ldai/judge/judge.go

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

ldai/config.go

ldai/datamodel/datamodel.go

ldai/judge/judge.go

ldai/client.go

ldai/tracker.go

cursor

Cursor Bugbot has reviewed your changes and found 2 potential issues.

cursor · 2026-02-10T02:23:31Z

ldai/client.go

+	extendedVariables["response_to_evaluate"] = "{{response_to_evaluate}}"
+
+	return c.Config(key, context, defaultValue, extendedVariables)
+}


JudgeConfig double-tracks config and judge metric events

Medium Severity

JudgeConfig emits $ld:ai:judge:function:single at line 196, then delegates to c.Config which independently emits $ld:ai:config:function:single at line 73. Every judge evaluation is therefore double-counted — once as a judge function call and once as a regular config function call. This inflates the config function metric on the monitoring dashboard, making it appear there are more regular config evaluations than actually occurred.

Additional Locations (1)

ldai/client.go#L72-L73

cursor · 2026-02-10T02:23:31Z

ldai/judge/judge.go

+		}
+	}
+
+	return "", fmt.Errorf("missing evaluationMetricKey")


Unused logger and configKey parameters in getMetricKey

Low Severity

getMetricKey accepts logger and configKey parameters but uses neither in the function body. These were likely intended for deprecation warnings (when falling back to the deprecated evaluationMetricKeys array) and for providing context in error messages, but appear to be remnants from a previous version where the logging loop was removed based on PR review feedback.

knfreemLD added 2 commits February 5, 2026 12:17

Added judge support for AI Configs

a5dcb5e

Removed debug logs

93397a7

knfreemLD requested a review from jsonbailey February 5, 2026 18:04

knfreemLD requested a review from a team as a code owner February 5, 2026 18:04

cursor bot reviewed Feb 5, 2026

View reviewed changes

ldai/tracker.go Show resolved Hide resolved

ldai/judge/judge.go Show resolved Hide resolved

knfreemLD and others added 3 commits February 5, 2026 13:20

Address cursor comments

6481937

gofmt

ab416ba

Fixed build

bd3fe09

Co-Authored-By: Claude Sonnet 4.5 <noreply@anthropic.com>

cursor bot reviewed Feb 5, 2026

View reviewed changes

ldai/config.go Outdated Show resolved Hide resolved

Cursor comment around null safety

e9195f0

jsonbailey requested changes Feb 6, 2026

View reviewed changes

ldai/datamodel/datamodel.go Show resolved Hide resolved

ldai/datamodel/datamodel.go Outdated Show resolved Hide resolved

ldai/datamodel/datamodel.go Show resolved Hide resolved

ldai/judge/judge.go Outdated Show resolved Hide resolved

Removed unneeded debugging and refactoring

f8e8ed2

cursor bot reviewed Feb 10, 2026

View reviewed changes

ldai/client.go Show resolved Hide resolved

ldai/tracker.go Show resolved Hide resolved

cursor comments

41141b9

cursor bot reviewed Feb 10, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add AI Config judge support #345

feat: add AI Config judge support #345

Uh oh!

knfreemLD commented Feb 5, 2026 •

edited by cursor bot

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Uh oh!

cursor bot Feb 10, 2026

Uh oh!

cursor bot Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: add AI Config judge support #345

Are you sure you want to change the base?

feat: add AI Config judge support #345

Uh oh!

Conversation

knfreemLD commented Feb 5, 2026 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

Uh oh!

cursor bot Feb 10, 2026

Choose a reason for hiding this comment

JudgeConfig double-tracks config and judge metric events

Uh oh!

cursor bot Feb 10, 2026

Choose a reason for hiding this comment

Unused logger and configKey parameters in getMetricKey

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

knfreemLD commented Feb 5, 2026 •

edited by cursor bot

Loading

`JudgeConfig` double-tracks config and judge metric events

Unused `logger` and `configKey` parameters in `getMetricKey`