Eval harness: YAML config system + steps_per_batch by kfallah · Pull Request #15 · kfallah/CLaaS

kfallah · 2026-02-19T04:39:10Z

Summary

YAML config system: Add --config CLI flag to load eval run configuration from a YAML file, enabling declarative experiment definitions that can be version-controlled. CLI args override YAML values. Includes type coercions for metrics (string/list), collapse_steps (string/list → set), and Tinker proxy defaults.
steps_per_batch parameter: New field controlling multiple gradient updates per feedback batch (analogous to PPO multi-epoch). IS ratios are automatically recomputed each sub-step via fresh forward passes. LoRA is reloaded between sub-steps in local mode.
Reference config: claas/eval/configs/example.yaml with all fields documented.

Test plan

uv run ruff check claas/ tests/ --fix — clean
uv run ty check — only pre-existing unresolved-import errors for GPU/tinker deps
uv run pytest tests/ -v -m "not integration" — 114 passed
New tests in tests/test_eval_config.py (10 tests) and tests/test_eval_runner.py (4 new tests)
Integration smoke: python -m claas.eval --config claas/eval/configs/example.yaml --num-steps 2 --steps-per-batch 2

🤖 Generated with Claude Code

Summary by CodeRabbit

Release Notes

New Features
- Evaluation harness now uses YAML configuration files for simplified setup
- Dashboard includes pagination for easier browsing of feedback and eval results
- Chat-based scoring endpoint added to inference proxy
- New parameters for step-level batch control and feedback iteration
Documentation
- Added comprehensive no-Docker setup guide for Tinker stack
- Expanded evaluation harness documentation with configuration examples and environment variable guidance
Chores
- Added hydra-core dependency

Add declarative YAML config files for reproducible eval runs and a steps_per_batch parameter for multiple gradient updates per feedback batch with automatic IS ratio recomputation. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-02-19T04:39:44Z

Important

Review skipped

Auto incremental reviews are disabled on this repository.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

✅ Review completed - (🔄 Check again to review again)

📝 Walkthrough

Walkthrough

The pull request adds YAML configuration file support to the evaluation framework with CLI override capability, introduces multi-step feedback submission per batch with LoRA reloading between steps, extends type definitions with steps-per-batch and sub-step tracking fields, and provides comprehensive test coverage for the new functionality.

Changes

Cohort / File(s)	Summary
Configuration System Enhancement `claas/eval/__main__.py`, `claas/eval/config.py`	Adds YAML configuration loader (`load_yaml_config`, `build_config_from_yaml`), introduces `--config` and `--steps-per-batch` CLI options, implements explicit CLI override detection via `_get_explicit_cli_overrides`, applies type coercions for metrics and collapse_steps, generates timestamped output directories, and integrates environment variable support for openclaw_api_key.
Step Feedback Processing `claas/eval/runner.py`	Implements multi-sub-step feedback submission within a single step, iterating `sub_steps_per_batch` times with LoRA reloading between sub-steps in local mode, tracks sub_steps_completed, and includes sub_step_count in StepResult payload.
Type Definitions `claas/eval/types.py`	Adds `steps_per_batch: int = 1` field to HarnessConfig and `sub_step_count: int = 1` field to StepResult.
Example Configuration & Dependencies `claas/eval/configs/example.yaml`, `pyproject.toml`	Provides a concrete YAML configuration example with model, preferences, metrics, step controls, and batching parameters; adds pyyaml>=6.0 dependency.
Test Coverage `tests/test_eval_config.py`, `tests/test_eval_runner.py`	Introduces comprehensive tests for YAML config loading, field validation, type coercions, CLI override precedence, and timestamped output directories; adds tests for HarnessConfig and StepResult field defaults and custom values.

Sequence Diagrams

sequenceDiagram
    participant User
    participant CLI as CLI Parser
    participant ConfigBuilder as build_config()
    participant YAMLLoader as build_config_from_yaml()
    participant HarnessConfig

    User->>CLI: Provide --config and optional CLI overrides
    CLI->>ConfigBuilder: args with config path
    ConfigBuilder->>ConfigBuilder: Detect explicit CLI overrides
    ConfigBuilder->>YAMLLoader: Load YAML file
    YAMLLoader->>YAMLLoader: Validate & coerce types<br/>(metrics, collapse_steps, preferences)
    YAMLLoader->>YAMLLoader: Apply mode-based defaults<br/>(proxy_url from vllm_url)
    YAMLLoader->>YAMLLoader: Generate timestamped output_dir
    YAMLLoader->>YAMLLoader: Load openclaw_api_key from env
    YAMLLoader->>HarnessConfig: Merge YAML config with CLI overrides
    ConfigBuilder->>HarnessConfig: Return final HarnessConfig
    HarnessConfig->>User: Configuration ready

sequenceDiagram
    participant Evaluator
    participant Runner as run_preference_experiment()
    participant Feedback as Feedback Submission
    participant vLLM
    participant Metrics as Metrics Accumulation

    Evaluator->>Runner: Start step with steps_per_batch=N
    loop For each sub-step (1 to N)
        Runner->>Feedback: Submit feedback for sub-step
        Feedback->>Metrics: Accumulate sdpo_metrics
        Metrics->>Runner: Metrics recorded
        alt sub-step not last
            Runner->>vLLM: Reload LoRA (local mode only)
            vLLM->>Runner: LoRA reloaded
        end
        alt sub-step fails
            Runner->>Runner: Log per-sub-step warning
            Runner->>Runner: Continue to next step
        end
    end
    Runner->>Runner: Construct StepResult<br/>with sub_step_count
    Runner->>Evaluator: Return StepResult

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A config file in YAML format appears,
With steps bundled per batch, sub-steps cheer!
LoRA reloads between feedback bursts,
CLI overrides quench the thirst,
A hopping good time with timestamped flair! 🌟

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately summarizes the main changes: YAML config system and steps_per_batch feature. It's concise, specific, and reflects the primary objectives of the PR.
Docstring Coverage	✅ Passed	Docstring coverage is 91.67% which is sufficient. The required threshold is 80.00%.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch eval-config-systems

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9584e7ee0f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-19T04:42:36Z

claas/eval/__main__.py

+    for dest, field_name in dest_to_field.items():
+        if dest in args_dict and args_dict[dest] != defaults.get(dest):
+            overrides[field_name] = args_dict[dest]


Honor explicit CLI flags even when equal to defaults

When --config is used, overrides are only applied if args_dict[dest] != defaults.get(dest), which drops CLI flags explicitly set to default values. That means YAML values cannot be overridden back to defaults (for example mode: tinker in YAML with --mode local, or plots: false with --plots), despite the CLI help promising that CLI args override YAML values; this can run experiments against the wrong backend/settings.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-19T04:42:36Z

claas/eval/runner.py

+                except (httpx.HTTPError, KeyError) as e:
+                    logger.warning(
+                        "[%s] Step %d sub-step %d feedback failed: %s",
+                        pref.name, step, sub_step, e,
+                    )
+                    break


Retry failed feedback submissions before aborting sub-steps

A transient HTTPError/KeyError during _submit_feedback now immediately breaks the sub-step loop without any retry, so a single flaky response can skip the entire training update for that step (and all remaining sub-steps when steps_per_batch > 1). In unstable environments this will silently under-train runs and distort eval trajectories.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

claas/eval/types.py (1)

269-279: ⚠️ Potential issue | 🟡 Minor

step_result_from_dict does not deserialize sub_step_count.

The new sub_step_count field is serialized via dataclasses.asdict in _append_step_jsonl (runner.py line 366), but step_result_from_dict never reads it back. Deserialized StepResult objects will always have sub_step_count=1 regardless of the persisted value, which silently drops sub-step information on reload.

Proposed fix

     return StepResult(
         preference=data["preference"],  # type: ignore[arg-type]
         step=data["step"],  # type: ignore[arg-type]
         timestamp=data["timestamp"],  # type: ignore[arg-type]
         feedback_given=data["feedback_given"],  # type: ignore[arg-type]
         sdpo_metrics=sdpo,
         eval=eval_metrics,
         prompt_used=data["prompt_used"],  # type: ignore[arg-type]
         response_text=data.get("response_text"),  # type: ignore[arg-type]
         timing_s=data.get("timing_s", 0.0),  # type: ignore[arg-type]
+        sub_step_count=data.get("sub_step_count", 1),  # type: ignore[arg-type]
     )

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed.

In `@claas/eval/types.py` around lines 269 - 279, The deserializer
step_result_from_dict is not restoring the serialized sub_step_count, so
StepResult instances lose their persisted sub-step value; update
step_result_from_dict to read data["sub_step_count"] (or
data.get("sub_step_count", 1)) and pass it into the StepResult constructor so
the StepResult.sub_step_count reflects the stored value (this complements the
serializer in _append_step_jsonl in runner.py and ensures StepResult objects
created by step_result_from_dict retain the persisted sub_step_count).

🧹 Nitpick comments (6)

tests/test_eval_config.py (1)
14-19: Consider using pytest's tmp_path fixture instead of manual temp file management.

Every test manually creates a temp file and cleans it up in a finally block. Using pytest's built-in tmp_path fixture would eliminate the boilerplate and guarantee cleanup even if you forget the finally (as happened in test_steps_per_batch_in_yaml).
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_eval_config.py` around lines 14 - 19, The tests create temp YAML
files via the helper _write_yaml (which uses tempfile.mkstemp) and manual
finally cleanup, leading to boilerplate and a missed cleanup in
test_steps_per_batch_in_yaml; refactor tests to use pytest's tmp_path fixture
instead: remove or adapt _write_yaml to accept a pathlib.Path from tmp_path (or
replace calls with tmp_path / "file.yaml" write_text), update tests (e.g.,
test_steps_per_batch_in_yaml) to accept tmp_path and write the YAML there, and
delete any manual tempfile cleanup/finally blocks so pytest manages file
lifecycle automatically.
claas/eval/runner.py (1)
664-664: sub_step_count defaults to 1 when no samples were collected.

When all sample generation fails (samples is empty), the sub-step loop is skipped entirely, sub_steps_completed stays 0, and sub_step_count is set to 1. This slightly misrepresents the step — no gradient update happened, but the result claims 1 sub-step. Consider using 0 to accurately reflect this case, or document the current behavior.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/eval/runner.py` at line 664, The code sets
sub_step_count=sub_steps_completed if sub_steps_completed > 0 else 1 which
reports 1 sub-step even when no samples were collected; change this to use the
actual sub_steps_completed (allowing 0) so the metric reflects no sub-steps
(i.e., set sub_step_count to sub_steps_completed without forcing a minimum of
1), and update any docstring or comment around sub_step_count in the runner to
note that 0 means no sub-steps/samples were processed.
claas/eval/config.py (1)
36-89: No validation for steps_per_batch >= 1.

The comment on HarnessConfig.steps_per_batch states (>=1) but neither build_config_from_yaml nor the CLI path validates this. A value of 0 would silently skip all feedback submissions (the range(0) loop in runner.py line 606 would not execute), and negative values would do the same. Consider adding a validation check.
Proposed fix (after line 87, before the return)
     data["output_dir"] = os.path.join(base_output, run_id)
 
+    if data.get("steps_per_batch", 1) < 1:
+        raise ValueError(f"steps_per_batch must be >= 1, got {data['steps_per_batch']}")
+
     return HarnessConfig(**data)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/eval/config.py` around lines 36 - 89, build_config_from_yaml currently
doesn't validate HarnessConfig.steps_per_batch, so add a check before returning:
coerce data.get("steps_per_batch") to int if it's a string, ensure it's present
(or left None) and if set must be >= 1, and if the value is < 1 raise a
ValueError with a clear message referencing "steps_per_batch"; perform this
validation after the existing type coercions and defaults (before
HarnessConfig(**data)) so invalid configs are rejected early (refer to
build_config_from_yaml and the HarnessConfig.steps_per_batch field when making
the change).
tests/test_eval_runner.py (1)
127-127: Move EvalMetrics import to the top of the file.

EvalMetrics is imported inline in two test functions (lines 127 and 143) but could be imported once at the module level alongside the other types already imported from claas.eval.types.
Proposed fix
 from claas.eval.types import (
+    EvalMetrics,
     HarnessConfig,
     LocalDistillMetrics,
     StepResult,
     TinkerDistillMetrics,
     step_result_from_dict,
 )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/test_eval_runner.py` at line 127, Move the inline imports of
EvalMetrics out of the individual test functions and add a single module-level
import from claas.eval.types alongside the existing type imports; locate the two
inline imports of EvalMetrics (used in the tests that currently import it inside
the function bodies) and remove them, then add EvalMetrics to the top-level
import list so the tests reference EvalMetrics directly without per-test
imports.
claas/eval/__main__.py (2)
171-224: Override detection can't distinguish "user explicitly passed the default value" from "user didn't pass the arg".

The comparison args_dict[dest] != defaults.get(dest) on line 208 means if a user explicitly passes --num-steps 20 (the default), it won't be included in overrides and the YAML value will take precedence. This is a known argparse limitation and is acceptable, but worth a brief comment.

Also, consider deriving dest_to_field from HarnessConfig fields to avoid the mapping going stale when new fields are added. Currently adding a new HarnessConfig field requires updating both this map and add_eval_arguments.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/eval/__main__.py` around lines 171 - 224, The function
_get_explicit_cli_overrides currently cannot tell when a user explicitly passed
a value equal to the argparse default (it compares args_dict[dest] !=
defaults.get(dest)), so add a short comment above that comparison explaining
this limitation and why it's acceptable; additionally replace the hard-coded
dest_to_field mapping with a derived mapping from HarnessConfig (e.g., use
HarnessConfig dataclass fields or HarnessConfig.__dataclass_fields__ to build
dest→field_name automatically) so new HarnessConfig fields don't require manual
updates (also ensure add_eval_arguments remains in sync or derive flags from the
same source).
227-235: build_config silently produces empty overrides when parser is None.

Line 234: when parser is None, cli_overrides defaults to {}, meaning all CLI arguments are silently ignored. Since parser is optional (argparse.ArgumentParser | None = None), a caller could inadvertently drop overrides. Consider documenting this or requiring parser when args.config is set.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/eval/__main__.py` around lines 227 - 235, The function build_config
currently ignores CLI overrides when args.config is set and parser is None;
change it to fail fast instead of silently using empty overrides: if args.config
and parser is None, raise a clear ValueError (or argparse-specific error)
explaining that a parser is required to extract explicit CLI overrides,
referencing build_config and _get_explicit_cli_overrides; keep the existing call
to build_config_from_yaml(cli_path, cli_overrides) when parser is provided. This
ensures callers must supply parser to preserve CLI overrides instead of silently
dropping them.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@claas/eval/config.py`:
- Around line 69-71: The preferences YAML string isn't coerced to a list so a
bare string like "no_emoji" will become config.preferences as a str instead of
list[str]; update the normalization logic around data["preferences"] in
config.py to handle both list and str cases (e.g., if
isinstance(data["preferences"], list) convert each item to str, else if
isinstance(data["preferences"], str) wrap it in a single-element list of str) so
that HarnessConfig always receives a list[str] (ensure references to
data["preferences"] and the resulting config.preferences are consistently
lists).

---

Outside diff comments:
In `@claas/eval/types.py`:
- Around line 269-279: The deserializer step_result_from_dict is not restoring
the serialized sub_step_count, so StepResult instances lose their persisted
sub-step value; update step_result_from_dict to read data["sub_step_count"] (or
data.get("sub_step_count", 1)) and pass it into the StepResult constructor so
the StepResult.sub_step_count reflects the stored value (this complements the
serializer in _append_step_jsonl in runner.py and ensures StepResult objects
created by step_result_from_dict retain the persisted sub_step_count).

---

Nitpick comments:
In `@claas/eval/__main__.py`:
- Around line 171-224: The function _get_explicit_cli_overrides currently cannot
tell when a user explicitly passed a value equal to the argparse default (it
compares args_dict[dest] != defaults.get(dest)), so add a short comment above
that comparison explaining this limitation and why it's acceptable; additionally
replace the hard-coded dest_to_field mapping with a derived mapping from
HarnessConfig (e.g., use HarnessConfig dataclass fields or
HarnessConfig.__dataclass_fields__ to build dest→field_name automatically) so
new HarnessConfig fields don't require manual updates (also ensure
add_eval_arguments remains in sync or derive flags from the same source).
- Around line 227-235: The function build_config currently ignores CLI overrides
when args.config is set and parser is None; change it to fail fast instead of
silently using empty overrides: if args.config and parser is None, raise a clear
ValueError (or argparse-specific error) explaining that a parser is required to
extract explicit CLI overrides, referencing build_config and
_get_explicit_cli_overrides; keep the existing call to
build_config_from_yaml(cli_path, cli_overrides) when parser is provided. This
ensures callers must supply parser to preserve CLI overrides instead of silently
dropping them.

In `@claas/eval/config.py`:
- Around line 36-89: build_config_from_yaml currently doesn't validate
HarnessConfig.steps_per_batch, so add a check before returning: coerce
data.get("steps_per_batch") to int if it's a string, ensure it's present (or
left None) and if set must be >= 1, and if the value is < 1 raise a ValueError
with a clear message referencing "steps_per_batch"; perform this validation
after the existing type coercions and defaults (before HarnessConfig(**data)) so
invalid configs are rejected early (refer to build_config_from_yaml and the
HarnessConfig.steps_per_batch field when making the change).

In `@claas/eval/runner.py`:
- Line 664: The code sets sub_step_count=sub_steps_completed if
sub_steps_completed > 0 else 1 which reports 1 sub-step even when no samples
were collected; change this to use the actual sub_steps_completed (allowing 0)
so the metric reflects no sub-steps (i.e., set sub_step_count to
sub_steps_completed without forcing a minimum of 1), and update any docstring or
comment around sub_step_count in the runner to note that 0 means no
sub-steps/samples were processed.

In `@tests/test_eval_config.py`:
- Around line 14-19: The tests create temp YAML files via the helper _write_yaml
(which uses tempfile.mkstemp) and manual finally cleanup, leading to boilerplate
and a missed cleanup in test_steps_per_batch_in_yaml; refactor tests to use
pytest's tmp_path fixture instead: remove or adapt _write_yaml to accept a
pathlib.Path from tmp_path (or replace calls with tmp_path / "file.yaml"
write_text), update tests (e.g., test_steps_per_batch_in_yaml) to accept
tmp_path and write the YAML there, and delete any manual tempfile
cleanup/finally blocks so pytest manages file lifecycle automatically.

In `@tests/test_eval_runner.py`:
- Line 127: Move the inline imports of EvalMetrics out of the individual test
functions and add a single module-level import from claas.eval.types alongside
the existing type imports; locate the two inline imports of EvalMetrics (used in
the tests that currently import it inside the function bodies) and remove them,
then add EvalMetrics to the top-level import list so the tests reference
EvalMetrics directly without per-test imports.

coderabbitai · 2026-02-19T04:44:07Z

claas/eval/config.py

+    # preferences: ensure list of strings
+    if "preferences" in data and isinstance(data["preferences"], list):
+        data["preferences"] = [str(p) for p in data["preferences"]]


⚠️ Potential issue | 🟠 Major

preferences as a plain YAML string is not coerced to a list.

If a user writes preferences: no_emoji (a bare string) in YAML, this block is skipped because isinstance(data["preferences"], list) is False. The string would be passed directly to HarnessConfig, which expects list[str]. A dataclass won't reject the wrong type at construction, so config.preferences would silently be the string "no_emoji" instead of ["no_emoji"], which could cause downstream iteration to yield individual characters.

Proposed fix

# preferences: ensure list of strings - if "preferences" in data and isinstance(data["preferences"], list): - data["preferences"] = [str(p) for p in data["preferences"]] + if "preferences" in data: + val = data["preferences"] + if isinstance(val, str): + data["preferences"] = [val] + elif isinstance(val, list): + data["preferences"] = [str(p) for p in val]

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@claas/eval/config.py` around lines 69 - 71, The preferences YAML string isn't coerced to a list so a bare string like "no_emoji" will become config.preferences as a str instead of list[str]; update the normalization logic around data["preferences"] in config.py to handle both list and str cases (e.g., if isinstance(data["preferences"], list) convert each item to str, else if isinstance(data["preferences"], str) wrap it in a single-element list of str) so that HarnessConfig always receives a list[str] (ensure references to data["preferences"] and the resulting config.preferences are consistently lists).

Replace the flat `limit` query param on /v1/dashboard with `page` and `per_page` params, and add the same pagination to /v1/eval. Shared pagination logic lives in claas/pagination.py with full test coverage. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The proxy /v1/score endpoint tokenized prompts as raw text while generation and training used the chat template, making logprob margins unreliable. Add /v1/score/chat endpoint that applies the chat template, update the eval client to use it, and fix the base_model in config from the HuggingFace name to Tinker's required name. - Add add_generation_prompt kwarg to _apply_chat_template_ids - Add ChatScoreRequest model and /v1/score/chat endpoint - Update fetch_response_logprob_sum_via_proxy to POST to /v1/score/chat - Add TestChatScoreEndpoint tests (4 cases) - Fix base.yaml: Qwen/Qwen3-Coder-30B-A3B-Instruct -> Qwen/Qwen3-30B-A3B - Rewrite eval README with Hydra config docs, setup steps, known gotchas - Add eval harness section to root README and CLAUDE.md Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

New skill for running the full Tinker stack (proxy + API + OpenClaw) natively without Docker. Documents correct config paths, env vars, and LoRA initialization learned from manual setup. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kfallah · 2026-02-19T22:09:36Z

@codex review

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: b0173cecaf

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-02-19T22:15:27Z

claas/eval/runner.py

+            for sub_step in range(config.steps_per_batch):
                try:
                    sdpo_metrics = await _submit_feedback(
                        config, actual_lora_id, samples,
                    )


Refresh rollout logprobs between feedback sub-steps

When steps_per_batch > 1, this loop reuses the same samples payload for every sub-step without recomputing each sample’s rollout_logprobs. Both training backends use sample.rollout_logprobs as the old-policy baseline for IS weighting, so sub-step 2+ will optimize against stale pre-update logprobs instead of the current policy, which distorts the intended multi-epoch update behavior and can corrupt eval/training results.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-02-19T22:15:27Z

claas/eval/runner.py

+        api_key = os.environ.get("OPENCLAW_GATEWAY_TOKEN", "")
+        params = openclaw_chat_params(config.openclaw_url, api_key, prompt)


Restore default OpenClaw token fallback for eval requests

This now resolves the OpenClaw API key to an empty string when OPENCLAW_GATEWAY_TOKEN is unset, but openclaw_chat_params still sends an Authorization header, resulting in Bearer for all OpenClaw calls. In the default local stack, the gateway is configured with openclaw-local-dev-token, so eval runs that set openclaw_url but don’t export this env var will fail with unauthorized responses.

Useful? React with 👍 / 👎.

coderabbitai

Actionable comments posted: 6

🧹 Nitpick comments (2)

claas/eval_dashboard.html (1)
111-132: Consider extracting shared pagination CSS.

The pagination styles (lines 111-132) are duplicated verbatim from claas/index.html. If these templates share a common rendering pipeline, consider extracting the pagination CSS into a shared stylesheet or injecting it server-side to reduce maintenance burden.
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/eval_dashboard.html` around lines 111 - 132, The pagination CSS block
is duplicated; extract the selectors (.pagination, .pagination-link,
.pagination-link.disabled, .pagination-status) into a shared stylesheet or a
common template partial and replace the duplicated block in this template with a
single import/include of that shared CSS; update the template rendering to load
the shared CSS (or inject server-side) so both pages use the same stylesheet and
remove the duplicated rules from this template.
claas/eval/metrics.py (1)
149-151: Avoid sharing mutable collapse-step lists across instances.
Consider copying the list to prevent accidental mutation of defaults or caller-provided lists.
♻️ Suggested tweak
-        self.check_steps = check_steps if check_steps is not None else DEFAULT_COLLAPSE_STEPS
+        self.check_steps = list(check_steps) if check_steps is not None else list(DEFAULT_COLLAPSE_STEPS)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@claas/eval/metrics.py` around lines 149 - 151, The constructor currently
assigns check_steps directly which can share mutable lists; in __init__ of the
class use a shallow copy when assigning check_steps so neither
DEFAULT_COLLAPSE_STEPS nor a caller-provided list is mutated (e.g., set
self.check_steps = list(check_steps) if check_steps is not None else
list(DEFAULT_COLLAPSE_STEPS)), keeping the rest of initialization unchanged.

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In @.claude/skills/setup-tinker-no-docker/SKILL.md:
- Around line 71-81: The CLAAS_TINKER_STATE_PATH environment assignment inside
the tmux command uses single quotes which prevent shell expansion of ${HOME};
update the tmux line so CLAAS_TINKER_STATE_PATH is assigned with expandable
quotes (e.g., use double quotes or no quotes around ${HOME}) in the tmux
new-session command that sets CLAAS_TINKER_STATE_PATH so the actual home
directory is used instead of the literal '${HOME}/.claas/tinker_state.json'.
- Around line 225-235: The HOME environment var in the tmux command is using
single quotes so $OPENCLAW_HOME won’t expand; update the tmux invocation (the
tmux new-session ... "TELEGRAM_BOT_TOKEN=... HOME=' $OPENCLAW_HOME' ... openclaw
gateway") to use unquoted or double-quoted expansion for OPENCLAW_HOME so HOME
receives the actual value (e.g., replace the single-quoted HOME assignment with
one that allows shell expansion) and keep the rest of the tmux
new-session/openclaw gateway invocation unchanged.

In `@claas/eval/__main__.py`:
- Around line 29-30: The line constructing eval_cfg uses a blanket type ignore
that CI reports as unused; remove the trailing "# type: ignore[arg-type]" from
the EvalConfig(**OmegaConf.to_container(cfg, resolve=True)) call so the type
checker sees no unnecessary ignores and then run the type checks to confirm no
type errors; reference symbols: EvalConfig, OmegaConf.to_container,
build_harness_config in __main__.py.

In `@claas/eval/config.py`:
- Around line 10-12: The eval module (claas.eval.config.py) currently
top-level-imports hydra (compose, initialize_config_dir, ConfigStore) and
omegaconf (OmegaConf) which are only declared in the "eval" extra and not
installed by CI; either update CI to install the eval extras by adding "--extra
eval" to the uv sync step in the CI workflow, or lazy-load these imports inside
the runtime entrypoint (e.g., move
compose/initialize_config_dir/ConfigStore/OmegaConf imports into cmd_eval() so
they are only imported when "claas eval" runs); also ensure pyproject.toml
explicitly lists omegaconf in the eval extra so it's installed directly rather
than only via hydra-core.
- Around line 52-55: build_harness_config currently always appends a new
timestamped run_id to fields["output_dir"], which prevents resume logic in
_load_completed_steps from finding previous steps; change it to first check
whether the current output_dir already ends with a run-id pattern (the run_id
format produced by datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%SZ"), i.e.
basename matching \d{8}-\d{6}Z) and only append a new timestamp if that pattern
is not present—modify the logic around run_id and fields["output_dir"] in
build_harness_config accordingly so previous runs with existing timestamped dirs
can be resumed and steps.jsonl will be found.

In `@claas/pagination.py`:
- Around line 32-41: The paginate function can divide by zero when per_page <=
0; update paginate to validate per_page (e.g., set per_page = max(1, per_page)
or raise ValueError) before computing total_pages, then use that validated
per_page in the total_pages calculation and when constructing PaginationInfo
(references: function paginate, variable per_page, total_pages, and
PaginationInfo).

---

Duplicate comments:
In `@claas/eval/config.py`:
- Around line 39-41: The code converts cfg to a plain container and passes it
into EvalConfig, but scalar YAML/override values for list fields (preferences,
metrics, collapse_steps) can become strings and will iterate per-character;
before calling EvalConfig(**container) in the function that uses
OmegaConf.to_container (the container variable), normalize those keys by
coercing them to lists when they are non-list scalars (e.g., if
isinstance(container.get("preferences"), str): container["preferences"] =
[container["preferences"]], likewise for "metrics" and "collapse_steps"], and
ensure None becomes an empty list if appropriate), then call
EvalConfig(**container) and remove the unused "# type: ignore[arg-type]".

---

Nitpick comments:
In `@claas/eval_dashboard.html`:
- Around line 111-132: The pagination CSS block is duplicated; extract the
selectors (.pagination, .pagination-link, .pagination-link.disabled,
.pagination-status) into a shared stylesheet or a common template partial and
replace the duplicated block in this template with a single import/include of
that shared CSS; update the template rendering to load the shared CSS (or inject
server-side) so both pages use the same stylesheet and remove the duplicated
rules from this template.

In `@claas/eval/metrics.py`:
- Around line 149-151: The constructor currently assigns check_steps directly
which can share mutable lists; in __init__ of the class use a shallow copy when
assigning check_steps so neither DEFAULT_COLLAPSE_STEPS nor a caller-provided
list is mutated (e.g., set self.check_steps = list(check_steps) if check_steps
is not None else list(DEFAULT_COLLAPSE_STEPS)), keeping the rest of
initialization unchanged.

coderabbitai · 2026-02-19T22:20:28Z

.claude/skills/setup-tinker-no-docker/SKILL.md

+tmux new-session -d -s claas-api \
+  "CLAAS_DISTILL_EXECUTION_MODE=tinker \
+   CLAAS_TINKER_API_KEY='<TINKER_API_KEY>' \
+   CLAAS_TINKER_BASE_MODEL='<MODEL>' \
+   CLAAS_TINKER_STATE_PATH='${HOME}/.claas/tinker_state.json' \
+   VLLM_BASE_URL='http://localhost:8000' \
+   VLLM_API_KEY=sk-local \
+   FEEDBACK_LOG_DIR=/tmp/feedback-logs \
+   uv run uvicorn claas.api:web_app --host 0.0.0.0 --port 8080 \
+   2>&1 | tee /tmp/claas-api.log"
+```


⚠️ Potential issue | 🟡 Minor

Fix shell variable expansion in tmux command.

Line 75 uses single quotes around ${HOME} which prevents variable expansion:

CLAAS_TINKER_STATE_PATH='${HOME}/.claas/tinker_state.json'

This will literally set the path to ${HOME}/.claas/tinker_state.json instead of expanding $HOME. Use double quotes or no quotes for the variable portion.

Proposed fix

tmux new-session -d -s claas-api \ "CLAAS_DISTILL_EXECUTION_MODE=tinker \ CLAAS_TINKER_API_KEY='<TINKER_API_KEY>' \ CLAAS_TINKER_BASE_MODEL='<MODEL>' \ - CLAAS_TINKER_STATE_PATH='${HOME}/.claas/tinker_state.json' \ + CLAAS_TINKER_STATE_PATH=\"\${HOME}/.claas/tinker_state.json\" \ VLLM_BASE_URL='http://localhost:8000' \

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.claude/skills/setup-tinker-no-docker/SKILL.md around lines 71 - 81, The CLAAS_TINKER_STATE_PATH environment assignment inside the tmux command uses single quotes which prevent shell expansion of ${HOME}; update the tmux line so CLAAS_TINKER_STATE_PATH is assigned with expandable quotes (e.g., use double quotes or no quotes around ${HOME}) in the tmux new-session command that sets CLAAS_TINKER_STATE_PATH so the actual home directory is used instead of the literal '${HOME}/.claas/tinker_state.json'.

coderabbitai · 2026-02-19T22:20:28Z

.claude/skills/setup-tinker-no-docker/SKILL.md

+```bash
+tmux new-session -d -s openclaw \
+  "TELEGRAM_BOT_TOKEN='<TELEGRAM_BOT_TOKEN>' \
+   CLAAS_VLLM_BASE_URL='http://localhost:8000' \
+   CLAAS_TINKER_PROXY_URL='http://localhost:8000' \
+   CLAAS_API_URL='http://localhost:8080' \
+   HOME='$OPENCLAW_HOME' \
+   OPENCLAW_GATEWAY_TOKEN='openclaw-local-dev-token' \
+   openclaw gateway --port 18789 --bind lan --allow-unconfigured --verbose \
+   2>&1 | tee /tmp/openclaw.log"
+```


⚠️ Potential issue | 🟡 Minor

Fix shell variable expansion for HOME assignment.

Line 231 uses single quotes which prevents $OPENCLAW_HOME from expanding:

HOME='$OPENCLAW_HOME'

This will literally set HOME to the string $OPENCLAW_HOME instead of its value.

Proposed fix

tmux new-session -d -s openclaw \ "TELEGRAM_BOT_TOKEN='<TELEGRAM_BOT_TOKEN>' \ CLAAS_VLLM_BASE_URL='http://localhost:8000' \ CLAAS_TINKER_PROXY_URL='http://localhost:8000' \ CLAAS_API_URL='http://localhost:8080' \ - HOME='$OPENCLAW_HOME' \ + HOME=\"\$OPENCLAW_HOME\" \ OPENCLAW_GATEWAY_TOKEN='openclaw-local-dev-token' \

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In @.claude/skills/setup-tinker-no-docker/SKILL.md around lines 225 - 235, The HOME environment var in the tmux command is using single quotes so $OPENCLAW_HOME won’t expand; update the tmux invocation (the tmux new-session ... "TELEGRAM_BOT_TOKEN=... HOME=' $OPENCLAW_HOME' ... openclaw gateway") to use unquoted or double-quoted expansion for OPENCLAW_HOME so HOME receives the actual value (e.g., replace the single-quoted HOME assignment with one that allows shell expansion) and keep the rest of the tmux new-session/openclaw gateway invocation unchanged.

coderabbitai · 2026-02-19T22:20:28Z

claas/eval/__main__.py

+    eval_cfg = EvalConfig(**OmegaConf.to_container(cfg, resolve=True))  # type: ignore[arg-type]
+    config = build_harness_config(eval_cfg)


⚠️ Potential issue | 🟡 Minor

Remove the unused blanket type: ignore.

CI warns this ignore is unused; dropping it should keep the check clean.

🧹 Suggested cleanup

- eval_cfg = EvalConfig(**OmegaConf.to_container(cfg, resolve=True)) # type: ignore[arg-type] + eval_cfg = EvalConfig(**OmegaConf.to_container(cfg, resolve=True))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

eval_cfg = EvalConfig(**OmegaConf.to_container(cfg, resolve=True)) # type: ignore[arg-type]

config = build_harness_config(eval_cfg)

eval_cfg = EvalConfig(**OmegaConf.to_container(cfg, resolve=True))

config = build_harness_config(eval_cfg)

🧰 Tools

🪛 GitHub Actions: CI

[warning] 29-29: Unused blanket type: ignore directive

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@claas/eval/__main__.py` around lines 29 - 30, The line constructing eval_cfg uses a blanket type ignore that CI reports as unused; remove the trailing "# type: ignore[arg-type]" from the EvalConfig(**OmegaConf.to_container(cfg, resolve=True)) call so the type checker sees no unnecessary ignores and then run the type checks to confirm no type errors; reference symbols: EvalConfig, OmegaConf.to_container, build_harness_config in __main__.py.

coderabbitai · 2026-02-19T22:20:28Z

claas/eval/config.py

+from hydra import compose, initialize_config_dir
+from hydra.core.config_store import ConfigStore
+from omegaconf import OmegaConf


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash # Check for hydra-core and omegaconf in pyproject.toml echo "=== Checking pyproject.toml for dependencies ===" cat pyproject.toml | grep -A 50 "\[project" | head -100 echo -e "\n=== Searching specifically for hydra-core and omegaconf ===" rg -i 'hydra-core|omegaconf' pyproject.toml -A 2 -B 2 || echo "No matches found in pyproject.toml" echo -e "\n=== Checking for eval-specific dependencies ===" rg -i 'eval|hydra|omega' pyproject.toml || echo "No matches found"

Repository: kfallah/CLaaS

Length of output: 1716

🏁 Script executed:

#!/bin/bash # Check CI configuration for eval dependency installation echo "=== Checking .github directory structure ===" find .github -type f -name "*.yml" -o -name "*.yaml" 2>/dev/null | head -20 echo -e "\n=== Looking for pip/dependency install commands in CI ===" grep -r "pip install\|uv install\|uv sync\|optional-dependencies\|eval" .github --include="*.yml" --include="*.yaml" -B 2 -A 2 || echo "No matches found" echo -e "\n=== Checking full CI workflow files ===" find .github -type f $ -name "*.yml" -o -name "*.yaml" $ -exec echo "File: {}" \; -exec head -50 {} \;

Repository: kfallah/CLaaS

Length of output: 1839

🏁 Script executed:

#!/bin/bash # Check if eval module is imported unconditionally echo "=== Checking claas/cli.py for eval imports ===" cat claas/cli.py 2>/dev/null | head -100 echo -e "\n=== Checking if eval module is imported anywhere ===" rg "from claas.eval|from claas import.*eval|import claas.eval" --type py echo -e "\n=== Checking claas/__init__.py ===" cat claas/__init__.py 2>/dev/null || echo "File not found" echo -e "\n=== Checking claas/eval/__init__.py ===" cat claas/eval/__init__.py 2>/dev/null || echo "File not found"

Repository: kfallah/CLaaS

Length of output: 3587

🏁 Script executed:

#!/bin/bash # Check if eval tests are marked/skipped appropriately echo "=== Checking test_eval_config.py for markers ===" head -50 tests/test_eval_config.py echo -e "\n=== Checking test_eval_runner.py for markers ===" head -50 tests/test_eval_runner.py echo -e "\n=== Check pytest.ini for marker definitions ===" grep -A 20 "pytest.ini\|tool.pytest" pyproject.toml echo -e "\n=== Check how many eval tests exist ===" rg "def test_" tests/test_eval*.py | wc -l

Repository: kfallah/CLaaS

Length of output: 3257

Add --extra eval to CI dependency installation or lazy-load eval module.

The eval module imports hydra and omegaconf, which are correctly declared as eval dependencies in pyproject.toml, but CI does not install them. The workflow runs uv sync --extra dev --extra local --extra tinker without --extra eval, causing type-check failures and import errors when claas eval or the eval tests run.

Either add --extra eval to the CI install steps in .github/workflows/ci.yml, or defer eval imports until runtime within cmd_eval() only (already done correctly). Additionally, consider explicitly declaring omegaconf as a direct dependency under eval since it's imported but only implicitly required through hydra-core.

🧰 Tools

🪛 GitHub Actions: CI

[error] 10-10: Cannot resolve imported module hydra

[error] 11-11: Cannot resolve imported module hydra.core.config_store

[error] 12-12: Cannot resolve imported module omegaconf

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@claas/eval/config.py` around lines 10 - 12, The eval module (claas.eval.config.py) currently top-level-imports hydra (compose, initialize_config_dir, ConfigStore) and omegaconf (OmegaConf) which are only declared in the "eval" extra and not installed by CI; either update CI to install the eval extras by adding "--extra eval" to the uv sync step in the CI workflow, or lazy-load these imports inside the runtime entrypoint (e.g., move compose/initialize_config_dir/ConfigStore/OmegaConf imports into cmd_eval() so they are only imported when "claas eval" runs); also ensure pyproject.toml explicitly lists omegaconf in the eval extra so it's installed directly rather than only via hydra-core.

coderabbitai · 2026-02-19T22:20:28Z

claas/eval/config.py

+    # Timestamped output subdir
+    run_id = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%SZ")
+    fields["output_dir"] = os.path.join(fields["output_dir"], run_id)
+


⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, check if the file exists and examine the target lines fd -t f config.py claas/eval

Repository: kfallah/CLaaS

Length of output: 77

🏁 Script executed:

# Once we confirm the file exists, read the relevant section head -100 claas/eval/config.py | tail -60

Repository: kfallah/CLaaS

Length of output: 1967

🏁 Script executed:

# Search for resume logic and output_dir usage patterns rg -n "output_dir|_load_completed_steps|resume" claas/eval -g '*.py' -A 2

Repository: kfallah/CLaaS

Length of output: 6199

🏁 Script executed:

# Look for the build_harness_config function and understand its full context ast-grep --pattern $'func build_harness_config($_) { $$$ }'

Repository: kfallah/CLaaS

Length of output: 39

Unconditional run_id append blocks resuming existing runs.

build_harness_config always appends a new timestamped subdirectory, preventing the resume logic from finding prior completed steps. When resuming, _load_completed_steps (runner.py:336) looks for steps.jsonl in the new timestamped directory, but prior runs are stored under the old timestamp.

Consider only appending a timestamp when the output directory doesn't already end with a run ID pattern, allowing reruns to reuse the same directory:

Guard to detect existing run directories

-import dataclasses -import os +import dataclasses +import os +import re @@ _DEFAULT_CONFIG_DIR = str(Path(__file__).parent / "configs") +_RUN_ID_RE = re.compile(r"\d{8}-\d{6}Z$") @@ - run_id = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%SZ") - fields["output_dir"] = os.path.join(fields["output_dir"], run_id) + output_dir = fields["output_dir"] + if not _RUN_ID_RE.search(Path(output_dir).name): + run_id = datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%SZ") + fields["output_dir"] = os.path.join(output_dir, run_id)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@claas/eval/config.py` around lines 52 - 55, build_harness_config currently always appends a new timestamped run_id to fields["output_dir"], which prevents resume logic in _load_completed_steps from finding previous steps; change it to first check whether the current output_dir already ends with a run-id pattern (the run_id format produced by datetime.now(timezone.utc).strftime("%Y%m%d-%H%M%SZ"), i.e. basename matching \d{8}-\d{6}Z) and only append a new timestamp if that pattern is not present—modify the logic around run_id and fields["output_dir"] in build_harness_config accordingly so previous runs with existing timestamped dirs can be resumed and steps.jsonl will be found.

coderabbitai · 2026-02-19T22:20:28Z

claas/pagination.py

+def paginate(total_items: int, page: int = 1, per_page: int = 20) -> PaginationInfo:
+    """Compute pagination state, clamping *page* to ``[1, total_pages]``."""
+    total_pages = max(1, math.ceil(total_items / per_page))
+    page = max(1, min(page, total_pages))
+    return PaginationInfo(
+        page=page,
+        per_page=per_page,
+        total_items=total_items,
+        total_pages=total_pages,
+    )


⚠️ Potential issue | 🟡 Minor

Validate per_page to prevent division by zero.

If per_page is 0 or negative (e.g., from a malformed API request), line 34 will raise ZeroDivisionError or produce incorrect pagination.

Proposed fix

def paginate(total_items: int, page: int = 1, per_page: int = 20) -> PaginationInfo: """Compute pagination state, clamping *page* to ``[1, total_pages]``.""" + per_page = max(1, per_page) total_pages = max(1, math.ceil(total_items / per_page)) page = max(1, min(page, total_pages))

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

def paginate(total_items: int, page: int = 1, per_page: int = 20) -> PaginationInfo:

"""Compute pagination state, clamping *page* to ``[1, total_pages]``."""

total_pages = max(1, math.ceil(total_items / per_page))

page = max(1, min(page, total_pages))

return PaginationInfo(

page=page,

per_page=per_page,

total_items=total_items,

total_pages=total_pages,

)

def paginate(total_items: int, page: int = 1, per_page: int = 20) -> PaginationInfo:

"""Compute pagination state, clamping *page* to ``[1, total_pages]``."""

per_page = max(1, per_page)

total_pages = max(1, math.ceil(total_items / per_page))

page = max(1, min(page, total_pages))

return PaginationInfo(

page=page,

per_page=per_page,

total_items=total_items,

total_pages=total_pages,

)

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@claas/pagination.py` around lines 32 - 41, The paginate function can divide by zero when per_page <= 0; update paginate to validate per_page (e.g., set per_page = max(1, per_page) or raise ValueError) before computing total_pages, then use that validated per_page in the total_pages calculation and when constructing PaginationInfo (references: function paginate, variable per_page, total_pages, and PaginationInfo).

chatgpt-codex-connector bot reviewed Feb 19, 2026

View reviewed changes

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

kfallah and others added 3 commits February 19, 2026 06:16

chatgpt-codex-connector bot reviewed Feb 19, 2026

View reviewed changes

coderabbitai bot reviewed Feb 19, 2026

View reviewed changes

		api_key = os.environ.get("OPENCLAW_GATEWAY_TOKEN", "")
		params = openclaw_chat_params(config.openclaw_url, api_key, prompt)

		eval_cfg = EvalConfig(**OmegaConf.to_container(cfg, resolve=True)) # type: ignore[arg-type]
		config = build_harness_config(eval_cfg)

Conversation

kfallah commented Feb 19, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

coderabbitai bot commented Feb 19, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Sequence Diagrams

Estimated code review effort

Poem

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

kfallah commented Feb 19, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

kfallah commented Feb 19, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Feb 19, 2026 •

edited

Loading