-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Test Warnings Report - 2026-01-26
Date: 2026-01-26
Tests with warnings: 11
Generated: 2026-01-27 09:09:05
Note: For detailed error messages and stack traces, check the log files in scripts/logs/ with date prefix 20260126.
Add GitHub issue links in the Ticket column for tracking.
| Test ID | Benchmark | Provider | Model | Severity | Warning Codes | Ticket | Action | Completed |
|---|---|---|---|---|---|---|---|---|
| T0023 | business_letters | mistral | pixtral-large-2411 | 🔴 Critical | ZERO_COST, ALL_NA | |||
| T0109 | business_letters | openai | gpt-5 | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete | 2026-02-17 | |
| T0113 | business_letters | openai | gpt-5-mini | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete | 2026-02-17 | |
| T0166 | library_cards | openai | gpt-5-mini | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete old | 2026-02-17 | |
| T0244 | business_letters | openrouter | qwen/qwen3-vl-8b-thinking | 🔴 Critical | ZERO_COST, ALL_NA | |||
| T0245 | business_letters | openrouter | qwen/qwen3-vl-8b-thinking | 🔴 Critical | ZERO_COST, ALL_NA | |||
| T0276 | medieval_manuscripts | openai | gpt-4o-mini | 🔴 Critical | ZERO_COST, ALL_NA | add graceful error handling to llm client | test rerun on 2026-02-17 | |
| T0278 | medieval_manuscripts | openai | gpt-4.1-nano | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete | 2026-02-17 | |
| T0421 | medieval_manuscripts | genai | gemini-3-pro-preview | 🔴 Critical | ZERO_COST, ALL_NA | |||
| T0555 | business_letters | mistral | ministral-8b-2512 | 🔴 Critical | ZERO_COST, ALL_NA | |||
| T0560 | medieval_manuscripts | mistral | ministral-8b-2512 | 🔴 Critical | ZERO_COST, ALL_NA |
Warning Codes Explanation
| Code | Severity | Description |
|---|---|---|
| ZERO_COST | 🟠 High | Total cost is $0 (pricing issue) |
| ALL_NA | 🔴 Critical | All metrics are N/A (scoring failed) |
| ZERO_SCORE | 🟡 Medium | Score is 0 (exceptionally bad performance) |
| ZERO_ITEMS | 🔴 Critical | No items processed |
| ZERO_DURATION | 🟠 High | No timing captured |
Action Required
- Review all critical issues
- Investigate high priority issues
- Check medium priority issues as time permits
- Re-run failed tests if needed
- Update pricing data if ZERO_COST warnings present
- Fix scoring issues if ALL_NA or ZERO_SCORE warnings present
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working