Test Warnings Report - 2026-01-26

# Test Warnings Report - 2026-01-26

**Date:** 2026-01-26
**Tests with warnings:** 11
**Generated:** 2026-01-27 09:09:05

---

**Note:** For detailed error messages and stack traces, check the log files in `scripts/logs/` with date prefix `20260126`.

*Add GitHub issue links in the Ticket column for tracking.*

| Test ID | Benchmark | Provider | Model | Severity | Warning Codes | Ticket | Action | Completed |
|---------|-----------|----------|-------|----------|---------------|--------|--------|-----------|
| T0023 | business_letters | mistral | pixtral-large-2411 | 🔴 Critical | ZERO_COST, ALL_NA | | | |
| T0109 | business_letters | openai | gpt-5 | 🔴 Critical | ZERO_COST, ALL_NA | | rerun & delete | 2026-02-17 |
| T0113 | business_letters | openai | gpt-5-mini | 🔴 Critical | ZERO_COST, ALL_NA | | rerun & delete |2026-02-17 |
| T0166 | library_cards | openai | gpt-5-mini | 🔴 Critical | ZERO_COST, ALL_NA | |rerun & delete old | 2026-02-17|
| T0244 | business_letters | openrouter | qwen/qwen3-vl-8b-thinking | 🔴 Critical | ZERO_COST, ALL_NA | | | |
| T0245 | business_letters | openrouter | qwen/qwen3-vl-8b-thinking | 🔴 Critical | ZERO_COST, ALL_NA | | | |
| T0276 | medieval_manuscripts | openai | gpt-4o-mini | 🔴 Critical | ZERO_COST, ALL_NA | | add graceful error handling to llm client| test rerun on 2026-02-17 |
| T0278 | medieval_manuscripts | openai | gpt-4.1-nano | 🔴 Critical | ZERO_COST, ALL_NA | |  rerun & delete | 2026-02-17|
| T0421 | medieval_manuscripts | genai | gemini-3-pro-preview | 🔴 Critical | ZERO_COST, ALL_NA | | | |
| T0555 | business_letters | mistral | ministral-8b-2512 | 🔴 Critical | ZERO_COST, ALL_NA | | | |
| T0560 | medieval_manuscripts | mistral | ministral-8b-2512 | 🔴 Critical | ZERO_COST, ALL_NA | | | |

---

## Warning Codes Explanation

| Code | Severity | Description |
|------|----------|-------------|
| **ZERO_COST** | 🟠 High | Total cost is $0 (pricing issue) |
| **ALL_NA** | 🔴 Critical | All metrics are N/A (scoring failed) |
| **ZERO_SCORE** | 🟡 Medium | Score is 0 (exceptionally bad performance) |
| **ZERO_ITEMS** | 🔴 Critical | No items processed |
| **ZERO_DURATION** | 🟠 High | No timing captured |

---

## Action Required

- [ ] Review all critical issues
- [ ] Investigate high priority issues
- [ ] Check medium priority issues as time permits
- [ ] Re-run failed tests if needed
- [ ] Update pricing data if ZERO_COST warnings present
- [ ] Fix scoring issues if ALL_NA or ZERO_SCORE warnings present


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test Warnings Report - 2026-01-26 #87

Test Warnings Report - 2026-01-26

Warning Codes Explanation

Action Required

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Test ID	Benchmark	Provider	Model	Severity	Warning Codes	Action	Completed
T0023	business_letters	mistral	pixtral-large-2411	🔴 Critical	ZERO_COST, ALL_NA
T0109	business_letters	openai	gpt-5	🔴 Critical	ZERO_COST, ALL_NA	rerun & delete	2026-02-17
T0113	business_letters	openai	gpt-5-mini	🔴 Critical	ZERO_COST, ALL_NA	rerun & delete	2026-02-17
T0166	library_cards	openai	gpt-5-mini	🔴 Critical	ZERO_COST, ALL_NA	rerun & delete old	2026-02-17
T0244	business_letters	openrouter	qwen/qwen3-vl-8b-thinking	🔴 Critical	ZERO_COST, ALL_NA
T0245	business_letters	openrouter	qwen/qwen3-vl-8b-thinking	🔴 Critical	ZERO_COST, ALL_NA
T0276	medieval_manuscripts	openai	gpt-4o-mini	🔴 Critical	ZERO_COST, ALL_NA	add graceful error handling to llm client	test rerun on 2026-02-17
T0278	medieval_manuscripts	openai	gpt-4.1-nano	🔴 Critical	ZERO_COST, ALL_NA	rerun & delete	2026-02-17
T0421	medieval_manuscripts	genai	gemini-3-pro-preview	🔴 Critical	ZERO_COST, ALL_NA
T0555	business_letters	mistral	ministral-8b-2512	🔴 Critical	ZERO_COST, ALL_NA
T0560	medieval_manuscripts	mistral	ministral-8b-2512	🔴 Critical	ZERO_COST, ALL_NA

Code	Severity	Description
ZERO_COST	🟠 High	Total cost is $0 (pricing issue)
ALL_NA	🔴 Critical	All metrics are N/A (scoring failed)
ZERO_SCORE	🟡 Medium	Score is 0 (exceptionally bad performance)
ZERO_ITEMS	🔴 Critical	No items processed
ZERO_DURATION	🟠 High	No timing captured

Test Warnings Report - 2026-01-26 #87

Description

Test Warnings Report - 2026-01-26

Warning Codes Explanation

Action Required

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions