Skip to content

Test Warnings Report - 2026-01-24 #84

@MHindermann

Description

@MHindermann

Test Warnings Report - 2026-01-24

Date: 2026-01-24
Tests with warnings: 43
Generated: 2026-01-25 15:49:13


Note: For detailed error messages and stack traces, check the log files in scripts/logs/ with date prefix 20260124.

Add GitHub issue links in the Ticket column for tracking.

Test ID Benchmark Provider Model Severity Warning Codes Ticket Action Completed
T0061 business_letters mistral pixtral-large-2411 🔴 Critical ZERO_COST, ALL_NA RISE-UNIBAS/generic_llm_api_client#3 rerun and delete old 2026-02-17
T0095 fraktur_adverts mistral pixtral-large-2411 🟡 Medium ZERO_SCORE - no action required (score simply bad) 2026-01-25
T0098 fraktur_adverts anthropic claude-opus-4-20250514 🟡 Medium ZERO_SCORE #85 rerun and delete old 2026-01-25
T0120 fraktur_adverts openai gpt-5 🟠 High ZERO_COST, ZERO_SCORE RISE-UNIBAS/generic_llm_api_client#2 rerun and delete old 2026-01-25
T0121 fraktur_adverts openai gpt-5-mini 🟠 High ZERO_COST, ZERO_SCORE RISE-UNIBAS/generic_llm_api_client#2 rerun and delete old 2026-01-25
T0122 fraktur_adverts openai gpt-5-nano 🟠 High ZERO_COST, ZERO_SCORE RISE-UNIBAS/generic_llm_api_client#2 rerun and delete old 2026-01-25
T0129 bibliographic_data openai gpt-5 🟠 High ZERO_COST, ZERO_SCORE RISE-UNIBAS/generic_llm_api_client#2 rerun and delete old 2026-01-25
T0130 bibliographic_data openai gpt-5-mini 🟠 High ZERO_COST, ZERO_SCORE RISE-UNIBAS/generic_llm_api_client#2 rerun and delete old 2026-01-25
T0131 bibliographic_data openai gpt-5-nano 🟠 High ZERO_COST, ZERO_SCORE RISE-UNIBAS/generic_llm_api_client#2 rerun and delete old 2026-01-25
T0133 bibliographic_data openai o3 🟠 High ZERO_COST, ZERO_SCORE RISE-UNIBAS/generic_llm_api_client#2 rerun and delete old 2026-02-17
T0137 fraktur_adverts openai o3 🟠 High ZERO_COST, ZERO_SCORE #85 rerun and delete old 2026-01-25
T0177 fraktur_adverts mistral mistral-medium-2508 🟡 Medium ZERO_SCORE - no action required (score simply bad) 2026-01-25
T0178 fraktur_adverts mistral mistral-medium-2505 🟡 Medium ZERO_SCORE - no action required (score simply bad) 2026-01-25
T0188 business_letters mistral mistral-large-2411 🔴 Critical ZERO_COST, ALL_NA RISE-UNIBAS/generic_llm_api_client#3 rerun and delete old 2026-02-17
T0190 business_letters mistral mistral-large-2411 🔴 Critical ZERO_COST, ALL_NA RISE-UNIBAS/generic_llm_api_client#3 rerun and delete old 2026-02-17
T0225 bibliographic_data anthropic claude-sonnet-4-5-20250929 🟡 Medium ZERO_SCORE no action required, simply bad performance 2026-02-17
T0229 fraktur_adverts anthropic claude-sonnet-4-5-20250929 🟡 Medium ZERO_SCORE no action required, simply bad performance 2026-02-17
T0233 bibliographic_data openrouter qwen/qwen3-vl-8b-thinking 🟠 High ZERO_COST RISE-UNIBAS/generic_llm_api_client#5 correct provider and inject cost 2026-02-17
T0243 business_letters openrouter qwen/qwen3-vl-8b-thinking 🔴 Critical ZERO_COST, ALL_NA rerun and delete old 2026-02-17
T0246 fraktur_adverts openrouter qwen/qwen3-vl-8b-thinking 🟠 High ZERO_COST RISE-UNIBAS/generic_llm_api_client#5 correct provider and inject cost 2026-02-17
T0248 business_letters openrouter meta-llama/llama-4-maverick 🔴 Critical ZERO_COST, ALL_NA rerun & delete old 2026-02-27
T0249 business_letters openrouter meta-llama/llama-4-maverick 🔴 Critical ZERO_COST, ALL_NA rerun & delete old 2026-02-27
T0250 business_letters openrouter meta-llama/llama-4-maverick 🔴 Critical ZERO_COST, ALL_NA rerun & delete old 2026-02-27
T0254 business_letters openrouter qwen/qwen3-vl-30b-a3b-instruct 🟠 High ZERO_COST RISE-UNIBAS/generic_llm_api_client#5 correct provider and inject cost 2026-02-17
T0255 business_letters openrouter qwen/qwen3-vl-30b-a3b-instruct 🟠 High ZERO_COST RISE-UNIBAS/generic_llm_api_client#5 correct provider and inject cost 2026-02-17
T0256 business_letters openrouter qwen/qwen3-vl-30b-a3b-instruct 🟠 High ZERO_COST RISE-UNIBAS/generic_llm_api_client#5 correct provider and inject cost 2026-02-17
T0285 medieval_manuscripts genai gemini-2.5-flash-lite 🔴 Critical ZERO_COST, ALL_NA #87 (comment) rerun and delete old
T0286 medieval_manuscripts genai gemini-2.5-flash-lite-preview-09-2025 🔴 Critical ZERO_COST, ALL_NA #87 (comment) rerun and delete old
T0416 business_letters genai gemini-3-pro-preview 🔴 Critical ZERO_COST, ALL_NA RISE-UNIBAS/generic_llm_api_client#1 rerun and delete old
T0418 business_letters genai gemini-3-pro-preview 🔴 Critical ZERO_COST, ALL_NA rerun & delete old 2026-02-17
T0422 blacklist_cards genai gemini-3-pro-preview 🟠 High "finish_reason": "error", ZERO_COST rerun & delete old 2026-02-17
T0423 company_lists genai gemini-3-pro-preview 🟠 High "finish_reason": "error", ZERO_COST rerun & delete old 2026-02-17
T0424 company_lists genai gemini-3-pro-preview 🟠 High "finish_reason": "error", ZERO_COST rerun & delete old 2026-02-17
T0425 bibliographic_data mistral magistral-medium-2509 🟡 Medium ZERO_SCORE no action rquired (bad score) 2026-02-17
T0426 business_letters mistral magistral-medium-2509 🔴 Critical ZERO_COST, ALL_NA rerun & delete old 2026-02-17
T0428 business_letters mistral magistral-medium-2509 🔴 Critical ZERO_COST, ALL_NA rerun & delete old
T0431 medieval_manuscripts mistral magistral-medium-2509 🔴 Critical ZERO_COST, ALL_NA rerun & delete old 2026-02-17
T0445 book_advert_xml genai gemini-3-pro-preview 🟠 High ZERO_COST, ZERO_SCORE rerun & delete old 2026-02-17
T0533 business_letters mistral mistral-large-2512 🔴 Critical ZERO_COST, ALL_NA rerun & delete old 2026-02-17
T0534 business_letters mistral mistral-large-2512 🔴 Critical ZERO_COST, ALL_NA rerun & delete old 2026-02-17
T0535 business_letters mistral mistral-large-2512 🔴 Critical ZERO_COST, ALL_NA rerun & delete old 2026-02-18
T0536 fraktur_adverts mistral mistral-large-2512 🟡 Medium ZERO_SCORE no action required (bad score) 2026-02-17
T0538 medieval_manuscripts mistral mistral-large-2512 🔴 Critical ZERO_COST, ALL_NA rerun & delete old 2026-02-18

Warning Codes Explanation

Code Severity Description
ZERO_COST 🟠 High Total cost is $0 (pricing issue)
ALL_NA 🔴 Critical All metrics are N/A (scoring failed)
ZERO_SCORE 🟡 Medium Score is 0 (exceptionally bad performance)
ZERO_ITEMS 🔴 Critical No items processed
ZERO_DURATION 🟠 High No timing captured

Action Required

  • Review all critical issues
  • Investigate high priority issues
  • Check medium priority issues as time permits
  • Re-run failed tests if needed
  • Update pricing data if ZERO_COST warnings present
  • Fix scoring issues if ALL_NA or ZERO_SCORE warnings present

Metadata

Metadata

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions