RFC: Integrate Slime RL backend into Agent Lightning (tinker-style service vs native integration)

## Summary
Add a first-class **Slime-powered Algorithm** to Agent Lightning (similar to `agl.VERL(...)`) so users can:
- run existing agents (LangGraph / OpenAI Agents SDK / custom) unchanged via the Runner,
- collect spans + rewards in `LightningStore`,
- train/update policy weights with **Slime**,
- inject the updated OpenAI-compatible endpoint to rollouts via the **`main_llm`** resource convention.

Agent Lightning architecture reference: Algorithm ↔ Runner ↔ LightningStore, with auxiliary Tracer/Adapter/LLM Proxy.  
Docs: Bird’s Eye View + Store deep dive.  
- https://microsoft.github.io/agent-lightning/latest/deep-dive/birds-eye-view/  
- https://microsoft.github.io/agent-lightning/stable/deep-dive/store/  

## Slime
Slime is an RL post-training framework designed for RL scaling with **Megatron + SGLang** and supports flexible data generation engines.  
Refs:
- https://thudm.github.io/slime/  
- https://github.com/THUDM/slime  

## Motivation / problem
Agent Lightning already supports RL via the VERL wrapper, with `main_llm` resource injection and per-attempt endpoint formatting via `ProxyLLM`.  
- VERL wrapper docs: https://microsoft.github.io/agent-lightning/latest/algorithm-zoo/verl/  
- SQL RL recipe: https://microsoft.github.io/agent-lightning/latest/how-to/train-sql-agent/  

I plan to work on Slime integration next week and want to open a technical discussion early to avoid choosing the wrong abstraction boundary.

## Design constraints from Agent Lightning (what we must respect)
- `LightningStore` is the single control plane: rollouts, attempts, ordered spans (`sequence_id`), and resources.  
- `ProxyLLM` exists specifically to inject rollout/attempt routing into endpoints (for attribution).  
- Adapters exist (e.g., `LlmProxyTraceToTriplet`) and ordering should rely on `sequence_id` (not timestamps) due to clock skew.

## Two integration tracks

### Track A — "tinker-style" training service backbone (recommended if we need a stable boundary)
Create a minimal training service API that Slime sits behind (service-ish boundary), then implement an Agent Lightning Algorithm that:
1) reads spans + rewards from `LightningStore`,
2) converts them into training batches,
3) calls a stable service surface (e.g., `train_step / load_weights / get_endpoint / checkpoint`) implemented by Slime.

Inspiration: OpenTinker’s disaggregated client/scheduler/training-server model.
- https://open-tinker.github.io/opentinker-page/
- https://github.com/open-tinker/OpenTinker

> NOTE: You referenced `radixark/miles`  link: https://github.com/radixark/miles/pull/239

Pros:
- Stable interface for Agent Lightning; Slime can evolve internally.
- Easier to swap backends in the future.

Cons:
- Risk of over-abstracting and fighting Slime’s native architecture.

### Track B — Native Slime integration (fastest POC)
Implement `agl.SLIME(config)` as a thin wrapper that:
- launches Slime router + trainer using Slime-native config/entrypoints,
- publishes a `ProxyLLM`-like OpenAI endpoint under `main_llm`,
- builds a TraceAdapter that emits Slime-consumable batches.

Pros:
- Quick to get working.
- Minimal extra layers.

Cons:
- Tighter coupling to Slime internals; upgrades may be fragile.


## Open questions for maintainers
1) Should `SLIME` live in Algorithm Zoo or as an external plugin?
2) Should we require `ProxyLLM` routing semantics for attribution (recommended), or allow a “dumb endpoint”?
3) What is the expected minimal training signal in spans? (token IDs, logprobs, tool-call segmentation)
4) v1 scope: single-turn only, or multi-turn/tool-call credit assignment?

## Acceptance criteria
- `agl.SLIME(...)` exists and can run an end-to-end loop with `Trainer`.
- Runners can consume `main_llm` endpoint (like the SQL RL recipe).
- Training updates change the served endpoint (resource versioning) without modifying agent code.
- Trace ordering and attribution are correct at attempt granularity.

## Ownership
I plan to self-assign and drive the initial implementation + example.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC: Integrate Slime RL backend into Agent Lightning (tinker-style service vs native integration) #453

Summary

Slime

Motivation / problem

Design constraints from Agent Lightning (what we must respect)

Two integration tracks

Track A — "tinker-style" training service backbone (recommended if we need a stable boundary)

Track B — Native Slime integration (fastest POC)

Open questions for maintainers

Acceptance criteria

Ownership

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

RFC: Integrate Slime RL backend into Agent Lightning (tinker-style service vs native integration) #453

Description

Summary

Slime

Motivation / problem

Design constraints from Agent Lightning (what we must respect)

Two integration tracks

Track A — "tinker-style" training service backbone (recommended if we need a stable boundary)

Track B — Native Slime integration (fastest POC)

Open questions for maintainers

Acceptance criteria

Ownership

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions