Skip to content

Conversation

@neubig
Copy link

@neubig neubig commented Jan 2, 2026

Problem

When JSON_LOGS=true is set, asyncio exceptions (like "Task exception was never retrieved") were being logged as plain text with each line of the traceback as a separate log entry. This made it difficult to parse and analyze errors in Datadog.

Example of the problem in Datadog:

{"message": "Task exception was never retrieved", "attributes": {"environment": "production"}}
{"message": "Traceback (most recent call last):", "attributes": {"environment": "production"}}
{"message": "  File \"/usr/lib/python3.13/site-packages/litellm/proxy/hooks/key_management_event_hooks.py\", line 48,", "attributes": {"environment": "production"}}
...

Notice that:

  1. Each line of the traceback is a separate log entry
  2. There's no level or timestamp attributes (indicating JSON formatter is not being used)
  3. There's no stacktrace field

Root Cause

The asyncio exception handler was being set on the wrong event loop at module import time. When litellm/_logging.py is imported, it calls asyncio.get_event_loop().set_exception_handler(). However, at import time, the event loop may not be the one that uvicorn will use. When uvicorn creates a new event loop, the exception handler is not set on it.

Additionally, the original async_json_exception_handler was setting exc_info=None, which meant the traceback was not being included in the JSON output.

Solution

  1. Add _setup_asyncio_json_exception_handler() function - This function sets the asyncio exception handler on the running event loop. It should be called AFTER the event loop is created by uvicorn.

  2. Call the function in proxy_startup_event - This ensures the handler is set on the correct event loop that uvicorn uses.

  3. Include full traceback in exc_info - Extract the traceback from the exception object using exception.__traceback__ and pass it to the LogRecord.

  4. Handle non-exception errors - Log them as JSON too instead of falling back to the default handler.

Expected Result

After this fix, asyncio exceptions will be logged as single JSON objects with the stacktrace field:

{
  "message": "Task exception was never retrieved",
  "level": "ERROR",
  "timestamp": "2026-01-02T17:57:31.144636",
  "stacktrace": "Traceback (most recent call last):\n  File \"/tmp/test.py\", line 68, in failing_task\n    raise RuntimeError(\"Async task failed!\")\nRuntimeError: Async task failed!"
}

Testing

Verified with a standalone test that:

  1. The JSON formatter correctly includes stacktrace in a single JSON object
  2. The asyncio exception handler is set on the running event loop
  3. Exceptions from fire-and-forget tasks are logged with full traceback in JSON format

@neubig can click here to continue refining the PR

yuneng-jiang and others added 30 commits December 22, 2025 18:21
[Feature] Allow Error code filtering on Spend Logs
…ner-integration

feat(databricks): Add enhanced authentication, security features, and custom user-agent support
…rror_code

[Feature] UI - Add Error Code Filtering on UI
[Fix] UI - Key Creation MCP Settings Submit Form Unintentionally
…-window-error

fix(gemini): properly catch context window exceeded errors
…issue

fix: lost tool_calls when streaming has both text and tool_calls
fix: remove deprecated Groq models and update model registry
* Add 5 AI providers using `openai_like`:

* Synthetic.new
* Apertis / Stima.tech
* NanoGPT
* Poe
* Chutes.ai

* Update additional missing locations
…8347)

Add missing pricing entries for azure/gpt-image-1.5 and azure/gpt-image-1.5-2025-12-16 to model_prices_and_context_window.json.

These models use token-based pricing (same as OpenAI):
- Text input: $5.00/1M tokens
- Image input: $8.00/1M tokens
- Image output: $32.00/1M tokens
- Cached text: $1.25/1M tokens
- Cached image: $2.00/1M tokens
…AI#18346)

Remove 'none' from gpt-5-mini's supported reasoning_effort values in the documentation table. gpt-5-mini does not support reasoning_effort="none", only minimal, low, medium, and high.
Added pricing and configuration details for the azure_ai/gpt-oss-120b model, including costs and capabilities.
… sensitive information

Co-authored-by: Copilot Autofix powered by AI <62310815+github-advanced-security[bot]@users.noreply.github.com>
uc4w6c and others added 27 commits January 2, 2026 17:20
* Allow get_nested_value dot notation to support escaping for Kubernetes JWT Support

* Add support for team and org alias fields, add docs, tests

* Fix lint issue with max statements in handle jwt logic
…odel-name

fix: unify model names to provider-defined names
…_server_to_playground

add selectable mcp servers to the playground
…taurl_on_ui

feat: add UI support for configuring meta URLs
…ic-image-urls

fix(vertex_ai): convert image URLs to base64 for Vertex AI Anthropic
…e-type-per-object

fix(vertex_ai): separate Tool objects for each tool type per API spec
…er-ui

feat: Add MiniMax provider support to UI dashboard
…ool-result-support

Fix Gemini 3 imgs in tool response
fix: correct deepseek-v3p2 pricing for Fireworks AI
…track

Add all resolution for gpt-image-1.5
…nstructions

Preserve system instructions for gemini
…_call_thought_sign

Add thought signature for non tool call requests
…haching_header

Remove prompt caching headers as the support has been removed
…budget

Add validation for negative budget
…n-based pricing (BerriAI#17906)

* fix(cost_calculator): correct gpt-image-1 cost calculation using token-based pricing (BerriAI#13847)

gpt-image-1 uses token-based pricing (like chat models), not pixel-based pricing
like DALL-E. The old code was calculating incorrect costs by treating it as DALL-E.

Changes:
- Update model pricing JSON with correct token-based costs for gpt-image-1
- Add dedicated cost calculator for OpenAI gpt-image models
- Route gpt-image-1 to token-based calculator in cost router
- Add comprehensive tests for the new calculator

* refactor: simplify gpt-image-1 cost calculator using responses API helper

Reuse _transform_response_api_usage_to_chat_usage and generic_cost_per_token
for gpt-image-1 cost calculation since ImageUsage has the same spec as
ResponseAPIUsage.
The asyncio exception handler was being set on the wrong event loop at
module import time. When uvicorn creates a new event loop, the handler
was not set on it, causing asyncio exceptions (like 'Task exception was
never retrieved') to be logged as plain text with each line as a
separate log entry.

Changes:
- Add _setup_asyncio_json_exception_handler() function that sets the
  exception handler on the running event loop
- Call this function in proxy_startup_event after uvicorn creates the
  event loop
- Include full traceback in exc_info by extracting it from the
  exception object
- Handle non-exception errors by logging them as JSON too

This ensures that asyncio exceptions are logged as single JSON objects
with the stacktrace field, making them easier to parse in Datadog.

Co-authored-by: openhands <openhands@all-hands.dev>
Adds log_format parameter supporting json_array (default), ndjson, and single formats. NDJSON format enables webhook integrations like Sumo Logic to parse individual log records at ingest time. Defaults to json_array for backward compatibility.
@openhands-ai
Copy link

openhands-ai bot commented Jan 2, 2026

Looks like there are a few issues preventing this PR from being merged!

  • GitHub Actions are failing:
    • LiteLLM Mock Tests (folder - tests/litellm)
    • LiteLLM Linting

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #3 at branch `fix-json-logging-asyncio`

Feel free to include any additional details that might help me get this PR into a better state.

You can manage your notification settings

@neubig neubig merged commit e432c43 into main Jan 2, 2026
2 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.