feat: Fleet client #1

dzorlu · 2025-12-12T20:47:57Z

PR: Fleet environments (OpenEnv)

This PR documents and refines the Fleet runtime integration for OpenEnv.

What this enables

Run OpenEnv environments on Fleet (remote) with no local Docker.
Keep a strict split between:
- Orchestration (HTTP): reset / step / state
- Agent actions (MCP): tools/list + tools/call

What this is not

This is not the local “Dockerized env server + env container” setup.
There is no container/provider abstraction here; Fleet hosts the runtime remotely (HTTP env server + MCP service). The client only connects.

Main abstractions

FleetEnvClient (HTTP): orchestrator handle for reset/step/state.
FleetMCPTools (MCP): agent handle for listing/calling tools.
- Unions tools across Fleet’s MCP endpoints (today often api/v1/mcp and mcp)
- Returns tools in OpenAI “tools” dict format (via convert_tool_format)
- Routes tool calls to the owning endpoint (cached after discovery)

Quickstart

Install: pip install "openenv-core[fleet]"
Set: export FLEET_API_KEY="..."
Run: python examples/fleet_env_example.py <env_key>

References

RFC 001: rfcs/001-abstractions.md
RFC 003: rfcs/003-mcp-support.md

TODOs / known sharp edges

Endpoint discovery (avoid hardcoding api/v1/mcp vs mcp)
Reset inconsistencies across some env keys (better errors + compatibility notes)
Tool-name collision policy across endpoints
Retries/backoff and clearer “endpoint down” failure modes

- FleetTaskEnv wraps FleetEnvClient with task-oriented interface - Accepts task configs from export_training_tasks.py - Creates versioned environments on reset - Injects task prompt into observations - Executes verifier for reward computation on episode completion - Supports both sync and async step methods - Factory functions: make_fleet_task_env, from_json_file - Tests: 20 unit tests for init, specs, verifiers, factories 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

The MCP images don't exist for all environment versions, causing FleetVersionNotFoundError when trying to create environments. Changing the default to None allows the Fleet SDK to use standard images which are available for all versions. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

FleetEnvClient.from_fleet() was not accepting data_key/data_version parameters, causing them to be passed through **kwargs to HTTPEnvClient which doesn't accept them. - Add data_key and data_version as explicit parameters - Pass them to fleet.make() - Update task_env.py to pass them separately 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Fleet SDK expects data_key in "key:version" format, not as separate parameters. Updated from_fleet() to combine them before calling fleet.make(). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

HTTPEnvClient.reset() doesn't support seed parameter yet. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Increases default timeout from 15s to 60s for Fleet API calls. This prevents timeouts during environment initialization. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Previously reset() did partial work and reset_async() added tool fetching. Now reset_async() does all the work (including fetching tools) and reset() is just a sync wrapper that calls it via run_until_complete(). This ensures both methods return identical results including tools. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

MCP's call_tool() returns a CallToolResult Pydantic object, not plain text. This was causing ugly repr strings to be passed to agents like: "meta=None content=[TextContent(type='text', text='...')] ..." Now properly extracts: - Text content from result.content[].text - Tries JSON parsing for structured results - Falls back to structuredContent if available - Handles isError cases 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Tests for: - FleetMCPClient._extract_tool_result(): - Single text content extraction - JSON parsing from text - Multiple text contents - Error result handling - Structured content fallback - Empty result handling - FleetTaskEnv reset: - reset_async() returns tools - reset() calls reset_async() (sync wrapper) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Move fleet.make() and list_tools() into FleetTaskEnv.__init__() - Tools are now fetched at env creation, not during reset - reset_async() calls _orch.reset() with error handling, returns cached tools - Use asyncio.run() for Python 3.13 compatibility - Update tests for new initialization pattern 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Log task_key and verifier code preview when verifier fails - Catch syntax errors separately with clear message - Show which functions were found if 'verify' is missing Helps debug issues like "Verifier code must define a 'verify' function" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace custom _execute_verifier_local() with Fleet SDK's Task.verify_detailed() which properly sets up the verifier namespace with: - Environment type annotation - Helper functions (normalized_contains, etc.) - Proper function discovery (not just "verify" function) This fixes "name 'Environment' is not defined" errors during verifier execution. Changes: - _compute_reward: Create Fleet SDK Task and call verify_detailed() - Support both 'verifier_code' and 'verifier_func' field names - Add comprehensive logging for debugging - Remove broken _execute_verifier_local method Tests: - Update all verifier tests to mock Fleet SDK Task.verify_detailed() - Add tests for various edge cases (no verifier, no orch, exceptions) - Fix fixture to avoid asyncio.run() conflicts with pytest-asyncio 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…context

- Add retry with exponential backoff (3 attempts, 1s/2s/4s delays) - Log errors instead of silently swallowing exceptions - Log warning when some clients fail but others succeed - Log error after all retries exhausted This fixes silent failures when MCP connections are flaky, which caused 'no tools found' errors in SkyRL training.

call_tool now retries with exponential backoff (3 attempts, 1s/2s/4s) on connection errors, similar to list_tools. ValueError (tool not found) is not retried. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Deniz added 4 commits December 12, 2025 12:14

fleet integartion step 0

38edfc2

updated README

f67bc43

readme update

164853c

another iteraton

935826f

dzorlu changed the title ~~Deniz/fleet client~~ feat: Fleet client Dec 13, 2025

Deniz and others added 15 commits December 17, 2025 17:29

readme

7c09d5b

conb

7efae22

Add __init__.py to envs package for pip install compatibility

791a071

fix: Remove seed parameter from reset() call

7852847

HTTPEnvClient.reset() doesn't support seed parameter yet. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

dzorlu mentioned this pull request Jan 26, 2026

fix: Use Fleet SDK Task.verify_detailed() for verifier execution #4

Closed

8 tasks

Deniz and others added 6 commits January 26, 2026 11:01

Fix: fetch tools lazily in reset_async to avoid asyncio.run in async …

ced5eca

…context

debug: add logging for call_tool to trace success/failure paths

a2f3531

fix: unwrap ExceptionGroup to show actual error cause

a08cb6d

debug: add logging for Fleet instance creation timing

9806eb8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Fleet client #1

feat: Fleet client #1

Uh oh!

dzorlu commented Dec 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Fleet client #1

Are you sure you want to change the base?

feat: Fleet client #1

Uh oh!

Conversation

dzorlu commented Dec 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR: Fleet environments (OpenEnv)

What this enables

What this is not

Main abstractions

Quickstart

References

TODOs / known sharp edges

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dzorlu commented Dec 12, 2025 •

edited

Loading