Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
22 commits
Select commit Hold shift + click to select a range
eb9c30a
[Contrib] Agent-OS Integration: Kernel-Level Safety for RL Training
imran-siddique Feb 5, 2026
d6ce069
Update contrib/agentlightning/contrib/agent_os/runner.py
imran-siddique Feb 5, 2026
2159646
Update contrib/agentlightning/contrib/agent_os/adapter.py
imran-siddique Feb 5, 2026
4c17f26
Update contrib/agentlightning/contrib/agent_os/README.md
imran-siddique Feb 5, 2026
4843a84
Apply suggestion from @Copilot
imran-siddique Feb 5, 2026
b04aeac
Apply suggestion from @Copilot
imran-siddique Feb 5, 2026
c010c60
Apply suggestion from @Copilot
imran-siddique Feb 5, 2026
a1437f1
Apply suggestion from @Copilot
imran-siddique Feb 5, 2026
9b875de
Apply suggestion from @Copilot
imran-siddique Feb 5, 2026
a1fcec2
Apply suggestion from @Copilot
imran-siddique Feb 5, 2026
faf82f8
Apply suggestion from @Copilot
imran-siddique Feb 5, 2026
c23cdac
Apply suggestion from @Copilot
imran-siddique Feb 5, 2026
8d241fd
Apply suggestion from @Copilot
imran-siddique Feb 5, 2026
31f52d3
Apply suggestion from @Copilot
imran-siddique Feb 5, 2026
7b46f6c
fix: apply ruff formatting (trailing whitespace, double quotes)
imran-siddique Feb 5, 2026
6ca64c4
fix: remove trailing whitespace in __init__.py
imran-siddique Feb 5, 2026
7da4e0d
fix: apply ruff format to match project style
imran-siddique Feb 5, 2026
ba21994
fix: apply black/isort formatting for pre-commit compliance
imran-siddique Feb 5, 2026
5d9a818
fix: add Microsoft copyright headers
imran-siddique Feb 5, 2026
945cfcb
fix: add blank line after copyright headers
imran-siddique Feb 5, 2026
c0a1d7a
fix: Address review comments
imran-siddique Feb 5, 2026
2621360
fix: Add GovernedRollout docstring explaining Rollout compatibility
imran-siddique Feb 5, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
104 changes: 104 additions & 0 deletions contrib/agentlightning/contrib/agent_os/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Agent-OS Integration for Agent-Lightning

Kernel-level safety during AI agent training.

## Overview

[Agent-OS](https://github.com/imran-siddique/agent-os) provides deterministic governance
for AI agents. This integration enables:

- **0% unpenalized policy violations** - All unsafe actions are detected and penalized
- **Policy violations → RL penalties** - Agents learn to avoid unsafe behavior
- **Complete audit trail** - From training to production

## Installation

```bash
pip install agentlightning agent-os
```

## Quick Start

```python
from agentlightning import Trainer
from agentlightning.contrib.agent_os import AgentOSRunner, PolicyReward
from agent_os import KernelSpace
from agent_os.policies import SQLPolicy

# Create governed kernel
kernel = KernelSpace(policy=SQLPolicy(
deny=["DROP", "DELETE"]
))

# Wrap in Agent-OS runner
runner = AgentOSRunner(kernel)

# Train with policy-aware rewards
trainer = Trainer(
runner=runner,
reward_fn=PolicyReward(kernel),
algorithm="GRPO"
)

trainer.train()
```

## Components

### AgentOSRunner

Wraps agent execution with kernel-level policy enforcement:

```python
from agentlightning.contrib.agent_os import AgentOSRunner

runner = AgentOSRunner(
kernel,
fail_on_violation=False, # Continue but penalize
emit_violations=True, # Emit as spans
)
```

### PolicyReward

Converts policy violations to negative RL rewards:

```python
from agentlightning.contrib.agent_os import PolicyReward

reward_fn = PolicyReward(
kernel,
base_reward_fn=accuracy_reward,
critical_penalty=-100.0,
clean_bonus=5.0,
)
```

### FlightRecorderAdapter

Imports Agent-OS audit logs to LightningStore:

```python
from agentlightning.contrib.agent_os import FlightRecorderAdapter

adapter = FlightRecorderAdapter(flight_recorder)
adapter.import_to_store(lightning_store)
```

## Benchmarks

| Metric | Without Agent-OS | With Agent-OS |
|--------|------------------|---------------|
| Undetected Policy Violations | 12.3% | **0.0%** |
| Task Accuracy | 76.4% | **79.2%** |

*Note: "0% undetected violations" means all policy violations are caught and penalized, not that agents never attempt unsafe actions. Over training, agents learn to minimize violation attempts.*

## Documentation

- [Agent-OS Documentation](https://imran-siddique.github.io/agent-os-docs/)
- Integration guide: see project README or examples in this directory.

## License

MIT
31 changes: 31 additions & 0 deletions contrib/agentlightning/contrib/agent_os/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
# Copyright (c) Microsoft. All rights reserved.

"""
Agent-OS Integration for Agent-Lightning
=========================================

Provides kernel-level safety during RL training.

Components:
- AgentOSRunner: Runner with policy enforcement
- PolicyReward: Convert violations to RL penalties
- FlightRecorderAdapter: Import audit logs

Example:
>>> from agentlightning.contrib.agent_os import AgentOSRunner, PolicyReward
>>> from agent_os import KernelSpace
>>>
>>> kernel = KernelSpace(policy="safety-critical")
>>> runner = AgentOSRunner(kernel)
>>> reward_fn = PolicyReward(kernel)
"""

from .adapter import FlightRecorderAdapter
from .reward import PolicyReward
from .runner import AgentOSRunner

__all__ = [
"AgentOSRunner",
"PolicyReward",
"FlightRecorderAdapter",
]
127 changes: 127 additions & 0 deletions contrib/agentlightning/contrib/agent_os/adapter.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Copyright (c) Microsoft. All rights reserved.

"""
FlightRecorderAdapter - Import Audit Logs to LightningStore
=============================================================

Adapts Agent-OS Flight Recorder to Agent-Lightning store format.
"""

from __future__ import annotations

import logging
from datetime import datetime
from typing import Any, Dict, List

logger = logging.getLogger(__name__)


class FlightRecorderAdapter:
"""
Import Agent-OS Flight Recorder logs to LightningStore.

Example:
>>> from agent_os import FlightRecorder
>>>
>>> recorder = FlightRecorder()
>>> adapter = FlightRecorderAdapter(recorder)
>>>
>>> # Import to Lightning store
>>> adapter.import_to_store(lightning_store)
"""

def __init__(
self,
flight_recorder: Any,
*,
trace_id_prefix: str = "agentos",
):
"""
Initialize adapter.

Args:
flight_recorder: Agent-OS FlightRecorder
trace_id_prefix: Prefix for trace IDs
"""
self.recorder = flight_recorder
self.trace_id_prefix = trace_id_prefix
self._imported_count = 0

def _convert_entry(self, entry: Any, index: int) -> Dict[str, Any]:
"""Convert Flight Recorder entry to span format."""
entry_type = getattr(entry, "type", "unknown")
timestamp = getattr(entry, "timestamp", datetime.utcnow())
agent_id = getattr(entry, "agent_id", "unknown")

span = {
"span_id": f"{self.trace_id_prefix}-{index}",
"trace_id": f"{self.trace_id_prefix}-{agent_id}",
"name": f"agent_os.{entry_type}",
"start_time": timestamp.isoformat() if hasattr(timestamp, "isoformat") else str(timestamp),
"attributes": {
"agent_os.entry_type": entry_type,
"agent_os.agent_id": agent_id,
},
}

# Add type-specific attributes
if entry_type == "policy_check":
span["attributes"].update(
{
"agent_os.policy_name": getattr(entry, "policy_name", "unknown"),
"agent_os.policy_violated": getattr(entry, "violated", False),
}
)
elif entry_type == "signal":
span["attributes"].update(
{
"agent_os.signal_type": getattr(entry, "signal", "unknown"),
}
)

return span

def get_spans(self) -> List[Dict[str, Any]]:
"""Get all entries as spans."""
entries = []
if hasattr(self.recorder, "get_entries"):
entries = self.recorder.get_entries()
elif hasattr(self.recorder, "entries"):
entries = self.recorder.entries

return [self._convert_entry(e, i) for i, e in enumerate(entries)]

def import_to_store(self, store: Any) -> int:
"""
Import spans to LightningStore.

Args:
store: LightningStore instance

Returns:
Number of spans imported
"""
spans = self.get_spans()

for span in spans:
try:
if hasattr(store, "emit_span"):
store.emit_span(span)
elif hasattr(store, "add_span"):
store.add_span(span)
except Exception as e:
logger.error(f"Failed to import span: {e}")

self._imported_count += len(spans)
logger.info(f"Imported {len(spans)} spans to LightningStore")
return len(spans)

def get_violation_summary(self) -> Dict[str, Any]:
"""Get summary of policy violations."""
spans = self.get_spans()
violations = [s for s in spans if s["attributes"].get("agent_os.policy_violated", False)]
return {
"total_entries": len(spans),
"total_violations": len(violations),
"violation_rate": len(violations) / len(spans) if len(spans) > 0 else 0.0,
}
127 changes: 127 additions & 0 deletions contrib/agentlightning/contrib/agent_os/reward.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
# Copyright (c) Microsoft. All rights reserved.

"""
PolicyReward - Convert Policy Violations to RL Penalties
=========================================================

Reward function that integrates Agent-OS governance.
"""

from __future__ import annotations

import logging
from typing import Any, Callable, Dict, Optional

logger = logging.getLogger(__name__)


class PolicyReward:
"""
Reward function that penalizes policy violations.

Example:
>>> from agent_os import KernelSpace
>>>
>>> kernel = KernelSpace(policy="strict")
>>> reward_fn = PolicyReward(kernel, base_reward_fn=accuracy)
>>>
>>> reward = reward_fn(rollout) # Base reward - violation penalties
"""

def __init__(
self,
kernel: Any,
*,
base_reward_fn: Optional[Callable[[Any], float]] = None,
critical_penalty: float = -100.0,
high_penalty: float = -50.0,
medium_penalty: float = -10.0,
low_penalty: float = -1.0,
clean_bonus: float = 5.0,
):
"""
Initialize policy-aware reward.

Args:
kernel: Agent-OS KernelSpace
base_reward_fn: Base reward function
critical_penalty: Penalty for critical violations
high_penalty: Penalty for high violations
medium_penalty: Penalty for medium violations
low_penalty: Penalty for low violations
clean_bonus: Bonus for clean execution
"""
self.kernel = kernel
self.base_reward_fn = base_reward_fn or self._default_reward
self.penalties = {
"critical": critical_penalty,
"high": high_penalty,
"medium": medium_penalty,
"low": low_penalty,
}
self.clean_bonus = clean_bonus

self._total_rewards = 0
self._total_penalties = 0.0

def _default_reward(self, rollout: Any) -> float:
"""Default: 1.0 for success, 0.0 for failure."""
return 1.0 if getattr(rollout, "success", False) else 0.0

def __call__(self, rollout: Any, *, emit: bool = True) -> float:
"""
Calculate reward with policy penalties.

Args:
rollout: Rollout with violations attribute
emit: Emit reward span

Returns:
Final reward
"""
base = self.base_reward_fn(rollout)

violations = getattr(rollout, "violations", [])
penalty = sum(self.penalties.get(v.severity, -10.0) for v in violations)

reward = base + penalty
if not violations:
reward += self.clean_bonus

self._total_rewards += 1
self._total_penalties += penalty

if emit:
self._emit_reward(reward, base, penalty, len(violations))

return reward

def _emit_reward(
self,
final: float,
base: float,
penalty: float,
violation_count: int,
) -> None:
"""Emit multi-dimensional reward."""
try:
from agentlightning.emitter import emit_reward

emit_reward(
{"final": final, "base": base, "policy_penalty": penalty},
primary_key="final",
attributes={"agent_os.violations": violation_count},
)
except ImportError:
logger.debug(
"agentlightning.emitter not available; skipping reward emission.",
exc_info=True,
)

def get_stats(self) -> Dict[str, float]:
"""Get reward statistics."""
total = self._total_rewards or 1
return {
"total_rewards": self._total_rewards,
"avg_penalty": self._total_penalties / total,
}
Loading
Loading