Skip to content

RFC: Integrate Slime RL backend into Agent Lightning (tinker-style service vs native integration) #453

@syeehyn

Description

@syeehyn

Summary

Add a first-class Slime-powered Algorithm to Agent Lightning (similar to agl.VERL(...)) so users can:

  • run existing agents (LangGraph / OpenAI Agents SDK / custom) unchanged via the Runner,
  • collect spans + rewards in LightningStore,
  • train/update policy weights with Slime,
  • inject the updated OpenAI-compatible endpoint to rollouts via the main_llm resource convention.

Agent Lightning architecture reference: Algorithm ↔ Runner ↔ LightningStore, with auxiliary Tracer/Adapter/LLM Proxy.
Docs: Bird’s Eye View + Store deep dive.

Slime

Slime is an RL post-training framework designed for RL scaling with Megatron + SGLang and supports flexible data generation engines.
Refs:

Motivation / problem

Agent Lightning already supports RL via the VERL wrapper, with main_llm resource injection and per-attempt endpoint formatting via ProxyLLM.

I plan to work on Slime integration next week and want to open a technical discussion early to avoid choosing the wrong abstraction boundary.

Design constraints from Agent Lightning (what we must respect)

  • LightningStore is the single control plane: rollouts, attempts, ordered spans (sequence_id), and resources.
  • ProxyLLM exists specifically to inject rollout/attempt routing into endpoints (for attribution).
  • Adapters exist (e.g., LlmProxyTraceToTriplet) and ordering should rely on sequence_id (not timestamps) due to clock skew.

Two integration tracks

Track A — "tinker-style" training service backbone (recommended if we need a stable boundary)

Create a minimal training service API that Slime sits behind (service-ish boundary), then implement an Agent Lightning Algorithm that:

  1. reads spans + rewards from LightningStore,
  2. converts them into training batches,
  3. calls a stable service surface (e.g., train_step / load_weights / get_endpoint / checkpoint) implemented by Slime.

Inspiration: OpenTinker’s disaggregated client/scheduler/training-server model.

NOTE: You referenced radixark/miles link: radixark/miles#239

Pros:

  • Stable interface for Agent Lightning; Slime can evolve internally.
  • Easier to swap backends in the future.

Cons:

  • Risk of over-abstracting and fighting Slime’s native architecture.

Track B — Native Slime integration (fastest POC)

Implement agl.SLIME(config) as a thin wrapper that:

  • launches Slime router + trainer using Slime-native config/entrypoints,
  • publishes a ProxyLLM-like OpenAI endpoint under main_llm,
  • builds a TraceAdapter that emits Slime-consumable batches.

Pros:

  • Quick to get working.
  • Minimal extra layers.

Cons:

  • Tighter coupling to Slime internals; upgrades may be fragile.

Open questions for maintainers

  1. Should SLIME live in Algorithm Zoo or as an external plugin?
  2. Should we require ProxyLLM routing semantics for attribution (recommended), or allow a “dumb endpoint”?
  3. What is the expected minimal training signal in spans? (token IDs, logprobs, tool-call segmentation)
  4. v1 scope: single-turn only, or multi-turn/tool-call credit assignment?

Acceptance criteria

  • agl.SLIME(...) exists and can run an end-to-end loop with Trainer.
  • Runners can consume main_llm endpoint (like the SQL RL recipe).
  • Training updates change the served endpoint (resource versioning) without modifying agent code.
  • Trace ordering and attribution are correct at attempt granularity.

Ownership

I plan to self-assign and drive the initial implementation + example.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or requesthelp wantedExtra hands from community will be appreciated

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions