-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Description
Test Warnings Report - 2026-01-24
Date: 2026-01-24
Tests with warnings: 43
Generated: 2026-01-25 15:49:13
Note: For detailed error messages and stack traces, check the log files in scripts/logs/ with date prefix 20260124.
Add GitHub issue links in the Ticket column for tracking.
| Test ID | Benchmark | Provider | Model | Severity | Warning Codes | Ticket | Action | Completed |
|---|---|---|---|---|---|---|---|---|
| T0061 | business_letters | mistral | pixtral-large-2411 | 🔴 Critical | ZERO_COST, ALL_NA | RISE-UNIBAS/generic_llm_api_client#3 | rerun and delete old | 2026-02-17 |
| T0095 | fraktur_adverts | mistral | pixtral-large-2411 | 🟡 Medium | ZERO_SCORE | - | no action required (score simply bad) | 2026-01-25 |
| T0098 | fraktur_adverts | anthropic | claude-opus-4-20250514 | 🟡 Medium | ZERO_SCORE | #85 | rerun and delete old | 2026-01-25 |
| T0120 | fraktur_adverts | openai | gpt-5 | 🟠 High | ZERO_COST, ZERO_SCORE | RISE-UNIBAS/generic_llm_api_client#2 | rerun and delete old | 2026-01-25 |
| T0121 | fraktur_adverts | openai | gpt-5-mini | 🟠 High | ZERO_COST, ZERO_SCORE | RISE-UNIBAS/generic_llm_api_client#2 | rerun and delete old | 2026-01-25 |
| T0122 | fraktur_adverts | openai | gpt-5-nano | 🟠 High | ZERO_COST, ZERO_SCORE | RISE-UNIBAS/generic_llm_api_client#2 | rerun and delete old | 2026-01-25 |
| T0129 | bibliographic_data | openai | gpt-5 | 🟠 High | ZERO_COST, ZERO_SCORE | RISE-UNIBAS/generic_llm_api_client#2 | rerun and delete old | 2026-01-25 |
| T0130 | bibliographic_data | openai | gpt-5-mini | 🟠 High | ZERO_COST, ZERO_SCORE | RISE-UNIBAS/generic_llm_api_client#2 | rerun and delete old | 2026-01-25 |
| T0131 | bibliographic_data | openai | gpt-5-nano | 🟠 High | ZERO_COST, ZERO_SCORE | RISE-UNIBAS/generic_llm_api_client#2 | rerun and delete old | 2026-01-25 |
| T0133 | bibliographic_data | openai | o3 | 🟠 High | ZERO_COST, ZERO_SCORE | RISE-UNIBAS/generic_llm_api_client#2 | rerun and delete old | 2026-02-17 |
| T0137 | fraktur_adverts | openai | o3 | 🟠 High | ZERO_COST, ZERO_SCORE | #85 | rerun and delete old | 2026-01-25 |
| T0177 | fraktur_adverts | mistral | mistral-medium-2508 | 🟡 Medium | ZERO_SCORE | - | no action required (score simply bad) | 2026-01-25 |
| T0178 | fraktur_adverts | mistral | mistral-medium-2505 | 🟡 Medium | ZERO_SCORE | - | no action required (score simply bad) | 2026-01-25 |
| T0188 | business_letters | mistral | mistral-large-2411 | 🔴 Critical | ZERO_COST, ALL_NA | RISE-UNIBAS/generic_llm_api_client#3 | rerun and delete old | 2026-02-17 |
| T0190 | business_letters | mistral | mistral-large-2411 | 🔴 Critical | ZERO_COST, ALL_NA | RISE-UNIBAS/generic_llm_api_client#3 | rerun and delete old | 2026-02-17 |
| T0225 | bibliographic_data | anthropic | claude-sonnet-4-5-20250929 | 🟡 Medium | ZERO_SCORE | no action required, simply bad performance | 2026-02-17 | |
| T0229 | fraktur_adverts | anthropic | claude-sonnet-4-5-20250929 | 🟡 Medium | ZERO_SCORE | no action required, simply bad performance | 2026-02-17 | |
| T0233 | bibliographic_data | openrouter | qwen/qwen3-vl-8b-thinking | 🟠 High | ZERO_COST | RISE-UNIBAS/generic_llm_api_client#5 | correct provider and inject cost | 2026-02-17 |
| T0243 | business_letters | openrouter | qwen/qwen3-vl-8b-thinking | 🔴 Critical | ZERO_COST, ALL_NA | rerun and delete old | 2026-02-17 | |
| T0246 | fraktur_adverts | openrouter | qwen/qwen3-vl-8b-thinking | 🟠 High | ZERO_COST | RISE-UNIBAS/generic_llm_api_client#5 | correct provider and inject cost | 2026-02-17 |
| T0248 | business_letters | openrouter | meta-llama/llama-4-maverick | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete old | 2026-02-27 | |
| T0249 | business_letters | openrouter | meta-llama/llama-4-maverick | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete old | 2026-02-27 | |
| T0250 | business_letters | openrouter | meta-llama/llama-4-maverick | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete old | 2026-02-27 | |
| T0254 | business_letters | openrouter | qwen/qwen3-vl-30b-a3b-instruct | 🟠 High | ZERO_COST | RISE-UNIBAS/generic_llm_api_client#5 | correct provider and inject cost | 2026-02-17 |
| T0255 | business_letters | openrouter | qwen/qwen3-vl-30b-a3b-instruct | 🟠 High | ZERO_COST | RISE-UNIBAS/generic_llm_api_client#5 | correct provider and inject cost | 2026-02-17 |
| T0256 | business_letters | openrouter | qwen/qwen3-vl-30b-a3b-instruct | 🟠 High | ZERO_COST | RISE-UNIBAS/generic_llm_api_client#5 | correct provider and inject cost | 2026-02-17 |
| T0285 | medieval_manuscripts | genai | gemini-2.5-flash-lite | 🔴 Critical | ZERO_COST, ALL_NA | #87 (comment) | rerun and delete old | |
| T0286 | medieval_manuscripts | genai | gemini-2.5-flash-lite-preview-09-2025 | 🔴 Critical | ZERO_COST, ALL_NA | #87 (comment) | rerun and delete old | |
| T0416 | business_letters | genai | gemini-3-pro-preview | 🔴 Critical | ZERO_COST, ALL_NA | RISE-UNIBAS/generic_llm_api_client#1 | rerun and delete old | |
| T0418 | business_letters | genai | gemini-3-pro-preview | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete old | 2026-02-17 | |
| T0422 | blacklist_cards | genai | gemini-3-pro-preview | 🟠 High | "finish_reason": "error", ZERO_COST | rerun & delete old | 2026-02-17 | |
| T0423 | company_lists | genai | gemini-3-pro-preview | 🟠 High | "finish_reason": "error", ZERO_COST | rerun & delete old | 2026-02-17 | |
| T0424 | company_lists | genai | gemini-3-pro-preview | 🟠 High | "finish_reason": "error", ZERO_COST | rerun & delete old | 2026-02-17 | |
| T0425 | bibliographic_data | mistral | magistral-medium-2509 | 🟡 Medium | ZERO_SCORE | no action rquired (bad score) | 2026-02-17 | |
| T0426 | business_letters | mistral | magistral-medium-2509 | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete old | 2026-02-17 | |
| T0428 | business_letters | mistral | magistral-medium-2509 | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete old | ||
| T0431 | medieval_manuscripts | mistral | magistral-medium-2509 | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete old | 2026-02-17 | |
| T0445 | book_advert_xml | genai | gemini-3-pro-preview | 🟠 High | ZERO_COST, ZERO_SCORE | rerun & delete old | 2026-02-17 | |
| T0533 | business_letters | mistral | mistral-large-2512 | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete old | 2026-02-17 | |
| T0534 | business_letters | mistral | mistral-large-2512 | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete old | 2026-02-17 | |
| T0535 | business_letters | mistral | mistral-large-2512 | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete old | 2026-02-18 | |
| T0536 | fraktur_adverts | mistral | mistral-large-2512 | 🟡 Medium | ZERO_SCORE | no action required (bad score) | 2026-02-17 | |
| T0538 | medieval_manuscripts | mistral | mistral-large-2512 | 🔴 Critical | ZERO_COST, ALL_NA | rerun & delete old | 2026-02-18 |
Warning Codes Explanation
| Code | Severity | Description |
|---|---|---|
| ZERO_COST | 🟠 High | Total cost is $0 (pricing issue) |
| ALL_NA | 🔴 Critical | All metrics are N/A (scoring failed) |
| ZERO_SCORE | 🟡 Medium | Score is 0 (exceptionally bad performance) |
| ZERO_ITEMS | 🔴 Critical | No items processed |
| ZERO_DURATION | 🟠 High | No timing captured |
Action Required
- Review all critical issues
- Investigate high priority issues
- Check medium priority issues as time permits
- Re-run failed tests if needed
- Update pricing data if ZERO_COST warnings present
- Fix scoring issues if ALL_NA or ZERO_SCORE warnings present
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't working