Skip to content

docs: clarify TTL, execution timeout, and result retention#525

Open
justinwlin wants to merge 7 commits intomainfrom
docs/clarify-ttl-execution-timeout
Open

docs: clarify TTL, execution timeout, and result retention#525
justinwlin wants to merge 7 commits intomainfrom
docs/clarify-ttl-execution-timeout

Conversation

@justinwlin
Copy link
Collaborator

@justinwlin justinwlin commented Feb 4, 2026

Summary

Clarifies the distinction between TTL, execution timeout, and result retention for serverless jobs.

Key Clarifications

  • TTL (time-to-live): How long job data is retained in the system. Covers both queue time and execution time - timer starts when job is submitted, not when execution begins. Default 24 hours, max 7 days.

  • Execution timeout: Maximum time a job can run once a worker picks it up. Default 10 minutes, max 7 days.

  • Result retention: Fixed period after job completion (30 min async, 1 min sync). Cannot be extended.

  • ?wait parameter: Controls how long the HTTP request waits for completion, NOT result retention.

Changes

endpoint-configurations.mdx:

  • Added explanation that TTL includes queue time
  • Added warning about jobs expiring in queue if TTL too short
  • Added result retention section

send-requests.mdx:

  • Added "Understanding TTL vs execution timeout" section with examples
  • Fixed ?wait documentation (controls wait time, not retention)
  • Updated policy options table
  • Simplified result retention documentation

Code References

All validation found in ai-api repo:

Behavior File Lines
TTL min validation (10s) pkg/job/job.go 169-171
Execution timeout min validation (5s) pkg/job/job.go 173-175
TTL set at job submission pkg/job/job.go 222
TTL NOT reset at execution start pkg/job/job.go 562-568
Result retention (sync = 1 min) pkg/job/job.go 795
Result retention (async = 30 min) pkg/job/job.go 797
?wait param parsing (default 90s, range 1-300s) pkg/api/runsync.go 30-38
?wait used for HTTP blocking, not retention pkg/api/runsync.go 122

- Clarify that TTL controls job data retention while queued/in progress
- Clarify that execution timeout controls max runtime during processing
- Add new 'Result retention' section explaining post-completion retention:
  - Async (/run): 30 minutes fixed
  - Sync (/runsync): 1 minute default, extendable to 5 minutes
- Add section on configuring long-running jobs (>24 hours)
- Update policy options table with clearer descriptions
- Add warning that system does not validate TTL >= executionTimeout
- Clarify that if TTL expires before job completes, data is deleted
- Emphasize the need to set both for long-running jobs
- ?wait controls how long the request waits for job completion (default 90s, max 5min)
- ?wait does NOT extend result retention
- Result retention is fixed: 30 min (async), 1 min (sync)
- Updated both endpoint-configurations.mdx and send-requests.mdx
Based on code investigation, TTL and execution timeout have NO maximum
enforced - only minimums:
- TTL: min 10,000 ms (ai-api/pkg/job/job.go:169-171)
- executionTimeout: min 5,000 ms (ai-api/pkg/job/job.go:173-175)

The '7 days max' was not found in any validation code across:
- ai-api (job processing)
- main-ui (frontend validation)
- runpod-backend (GraphQL schema)
- Explain that TTL timer starts at job submission, not execution start
- Add warning that jobs can expire in queue if TTL is too short
- Add clear example showing TTL vs execution timeout interaction
- Remove duplicate code example
@justinwlin justinwlin force-pushed the docs/clarify-ttl-execution-timeout branch from 6e3fc83 to 5b18329 Compare February 4, 2026 21:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant