An Advanced AI/ML Training & Serving Engine for AGI OS
JARVIS Reactor (formerly Reactor Core) is the "nervous system" of the JARVIS AGI ecosystem, providing enterprise-grade ML training, model serving, and real-time event coordination across distributed AI systems.
JARVIS Reactor is a production-grade ML infrastructure combining:
- Advanced Training Methods: DPO, RLHF, Constitutional AI, Curriculum Learning
- Model Serving: Hot-reload model server with multi-backend support (vLLM, llama.cpp, MLX)
- Async Infrastructure: Circuit breakers, backpressure, bulkheads, dead letter queues
- API Platform: FastAPI server with telemetry, scheduling, model registry, health monitoring
- Trinity Orchestration: Multi-repo coordination with heartbeat monitoring and state sync
- Event Streaming: Real-time WebSocket/Redis pub-sub across JARVIS ecosystem
- GCP Integration: Spot VM resilience, Cloud SQL storage, auto-checkpointing
- MLForge C++ Core: High-performance ML primitives (optional submodule)
- Architecture
- Key Features
- Installation
- Quick Start
- Advanced Features
- Integration Architecture
- API Documentation
- Configuration
- Development
- Version History
- Links
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β REACTOR CORE v77.1 β
β (AGI OS Nervous System) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β UNIFIED API SERVER (v77.0) β β
β β βββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββ β β
β β β Telemetry β β Night β β Model β β β
β β β Collector β β Scheduler β β Registry β β β
β β βββββββββββββββ βββββββββββββββ ββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β HOT-RELOAD MODEL SERVER (v77.1) β β
β β β’ Multi-backend support (vLLM, llama.cpp, MLX, Transformers)β β
β β β’ Zero-downtime model swaps β β
β β β’ LRU cache + semantic response caching β β
β β β’ Priority request queue β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ADVANCED TRAINING ENGINE (v76.0) β β
β β β β
β β Experience Buffer β Data Selector β Training Router β β
β β β β β
β β βββββββββββββββββββββββββΌββββββββββββββββββββββββ β β
β β β β β β β
β β βΌ βΌ βΌ β β
β β DPO Trainer RLHF Pipeline Constitutional AI β β
β β β’ Preference β’ PPO Algorithm β’ Self-supervisedβ β
β β Learning β’ Reward Modeling β’ Safety β β
β β β’ Memory Efficient β’ Value Functions β’ Alignment β β
β β β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β ASYNC INFRASTRUCTURE (v76.1) β β
β β β’ CircuitBreaker β’ Backpressure β’ DeadLetterQueue β β
β β β’ Bulkhead β’ HealthMonitor β’ AdaptiveRateLimiter β β
β β β’ TimeoutPolicy β’ MetricsCollector β’ AsyncRetry β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β TRINITY ORCHESTRATOR (v75.0) β β
β β β’ Multi-repo heartbeat monitoring β β
β β β’ Command routing with load balancing β β
β β β’ State reconciliation β β
β β β’ Dead Letter Queue for failed commands β β
β β β’ Atomic file I/O (v73.0) β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β EVENT STREAMING (v10.3) β β
β β β’ WebSocket real-time events β β
β β β’ Redis pub/sub (optional) β β
β β β’ Safety audit trail β β
β β β’ Cost tracking & budget alerts β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βΌ βΌ βΌ β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β MLForge C++ β β Cloud SQL β β GCP Storage β β
β β (Optional) β β (Events DB) β β(Checkpoints) β β
β ββββββββββββββββ ββββββββββββββββ ββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
JARVIS-Reactor/
βββ reactor_core/
β βββ training/ # Advanced training methods
β β βββ advanced_training.py # DPO, RLHF, Constitutional AI (2,899 lines)
β β βββ unified_pipeline.py # End-to-end training orchestration
β β βββ trainer.py # Base trainer class
β β βββ lora.py # LoRA/QLoRA implementations
β β
β βββ serving/ # Model serving infrastructure
β β βββ model_server.py # Hot-reload model server (1,545 lines)
β β βββ inference_engine.py # Multi-backend inference (1,891 lines)
β β
β βββ api/ # REST API server
β β βββ server.py # FastAPI endpoints (2,252 lines)
β β βββ telemetry.py # Metrics & observability (1,128 lines)
β β βββ scheduler.py # Night Shift scheduler (1,030 lines)
β β βββ model_registry.py # Model versioning (1,301 lines)
β β βββ health_aggregator.py # Health monitoring (999 lines)
β β
β βββ orchestration/ # Trinity coordination
β β βββ trinity_orchestrator.py # Multi-repo orchestrator
β β
β βββ utils/ # Core utilities
β β βββ async_helpers.py # Async patterns (1,746 lines)
β β βββ dependencies.py # Dependency injection (913 lines)
β β
β βββ integration/ # Cross-repo integration
β β βββ event_bridge.py # Event streaming
β β βββ cost_bridge.py # Cost tracking
β β βββ jarvis_connector.py # JARVIS integration
β β βββ prime_connector.py # Prime integration
β β
β βββ eval/ # Model evaluation
β β βββ advanced_evaluation.py # Comprehensive eval suite (1,536 lines)
β β
β βββ data/ # Data loading & preprocessing
β βββ gcp/ # GCP Spot VM support
β βββ config/ # Configuration management
β
βββ run_supervisor.py # AGI OS unified supervisor (1,635 lines)
βββ mlforge/ # C++ ML core (submodule)
βββ docker/ # Docker configurations
βββ scripts/ # Utility scripts
βββ tests/ # Test suite
Total: ~18,996+ lines of production code added in v75.0-v77.1
- DPO (Direct Preference Optimization): Preference learning without reward models
- RLHF (Reinforcement Learning from Human Feedback): Full PPO pipeline
- Constitutional AI: Self-supervised safety alignment
- Curriculum Learning: Progressive difficulty scheduling
- Memory Management: Dynamic batch sizing, gradient checkpointing, CPU offloading
- FSDP Support: Fully Sharded Data Parallel for large models
- Experience Replay: Priority-based sampling from interaction logs
- CircuitBreaker: Automatic failure detection and recovery
- Backpressure: Adaptive load management with queue shedding
- Bulkhead: Failure isolation between components
- DeadLetterQueue: Failed operation tracking and replay
- HealthMonitor: Real-time component health tracking
- AdaptiveRateLimiter: Dynamic rate limiting based on success rates
- TimeoutPolicy: Configurable timeouts with fallback strategies
- MetricsCollector: Comprehensive observability
- FastAPI Server: Production-grade REST API with auto-docs
- Telemetry Collector: Real-time metrics ingestion with WebSocket streaming
- Night Shift Scheduler: Automated training during off-peak hours
- Model Registry: Version management, A/B testing, rollback support
- Health Aggregator: Multi-service health dashboard
- Cost Tracking: Budget alerts and spend analytics
- WebSocket Events: Real-time training progress streaming
- Hot-Reload: Zero-downtime model updates via file watcher
- Multi-Backend Support: vLLM, llama.cpp, MLX, Transformers
- LRU Model Cache: Memory-aware model eviction
- Priority Queue: Request prioritization for SLA compliance
- Semantic Caching: Hash-based response deduplication
- Circuit Breaker: Backend failure protection
- Async Loading: Non-blocking model initialization
- Version Management: Seamless model version switching
- Multi-Repo Coordination: Heartbeat monitoring across JARVIS, Prime, Reactor
- Command Routing: Intelligent load balancing with priority queues
- State Reconciliation: Consistent state across distributed system
- Dead Letter Queue: Failed command tracking and retry
- Atomic File I/O: Zero-corruption file operations (v73.0)
- Self-Heartbeat: Liveness monitoring (v72.0)
- Circuit Breakers: Fault tolerance with automatic recovery
- WebSocket Streaming: Real-time event broadcasting
- Redis Pub/Sub: Optional Redis backend for scale
- Event Deduplication: Hash-based duplicate prevention
- Priority System: Safety-critical event prioritization
- Safety Audit Trail: Comprehensive action logging
- Cost Events: Budget tracking with alerts
- Multi-Transport: WebSocket, file-watching, Redis
- Spot VM Resilience: Auto-resume from preemption
- Cloud SQL Storage: Event and metric persistence
- GCS Checkpointing: Distributed checkpoint storage
- Auto-Detection: M1 local vs GCP remote environment detection
pip install jarvis-reactor# Clone with submodules
git clone --recursive https://github.com/drussell23/JARVIS-Reactor.git
cd JARVIS-Reactor
# Install dependencies (requires CMake and pybind11)
pip install pybind11 cmake
# Build and install
pip install -e .# For local development (M1 Mac)
pip install jarvis-reactor[local]
# For GCP training (32GB+ VM)
pip install jarvis-reactor[gcp]
# For full development (includes testing, linting, docs)
pip install -e ".[dev]"# Build Docker image
docker-compose build
# Run API server
docker-compose up api
# Run model server
docker-compose up model-server
# Run unified supervisor
docker-compose up supervisorfrom reactor_core import Trainer, TrainingConfig
from reactor_core.gcp import SpotVMCheckpointer
# Configure training
config = TrainingConfig(
model_name="llama-2-7b",
use_lora=True,
lora_rank=16,
num_epochs=3,
batch_size=4,
gradient_checkpointing=True,
)
# Auto-detect environment (M1 local vs GCP remote)
trainer = Trainer(config)
# Train with auto-resume on Spot VM preemption
trainer.train("./data/train.jsonl")from reactor_core.training.advanced_training import (
DPOTrainer,
DPOConfig,
PreferenceDataset,
)
# Configure DPO
dpo_config = DPOConfig(
model_name="llama-2-7b",
beta=0.1, # KL divergence penalty
learning_rate=5e-7,
max_length=512,
batch_size=4,
)
# Initialize DPO trainer
dpo_trainer = DPOTrainer(dpo_config)
# Train on preference pairs
await dpo_trainer.train(
preference_dataset=PreferenceDataset(
chosen_responses=chosen_data,
rejected_responses=rejected_data,
),
num_epochs=3,
)from reactor_core.serving.model_server import ModelServer, ModelServerConfig
# Configure model server
config = ModelServerConfig(
models_dir="/path/to/models",
enable_hot_reload=True,
backend="vllm", # or "transformers", "llamacpp", "mlx"
max_cached_models=3,
)
# Initialize server
server = ModelServer(config)
await server.start()
# Serve inference requests
response = await server.predict(
prompt="What is machine learning?",
model_id="llama-2-7b",
max_tokens=256,
)
print(response.text)
# Hot-reload: Just update the model file, server auto-reloads!# Start API server
uvicorn reactor_core.api.server:app --host 0.0.0.0 --port 8003 --reloadimport requests
# Trigger training via API
response = requests.post(
"http://localhost:8003/training/trigger",
json={
"model_name": "llama-2-7b",
"training_type": "dpo",
"config": {
"num_epochs": 3,
"batch_size": 4,
"learning_rate": 5e-7,
},
},
)
# Schedule nightly training
response = requests.post(
"http://localhost:8003/scheduler/schedule",
json={
"name": "nightly_dpo_training",
"schedule_type": "cron",
"cron_expression": "0 2 * * *", # 2 AM daily
"job_config": {
"training_type": "dpo",
"model_name": "llama-2-7b",
},
},
)from reactor_core.orchestration.trinity_orchestrator import (
initialize_orchestrator,
get_orchestrator,
)
# Initialize orchestrator
orchestrator = await initialize_orchestrator()
# Dispatch command to JARVIS/Prime
await orchestrator.dispatch_command(
intent="start_surveillance",
payload={
"app_name": "Chrome",
"trigger_text": "bouncing ball",
},
target_components=["jarvis"],
)
# Check component health
health = await orchestrator.get_health_status()
print(f"JARVIS: {health['jarvis'].status}")
print(f"Prime: {health['prime'].status}")
print(f"Reactor: {health['reactor'].status}")# Start entire AGI OS ecosystem
python3 run_supervisor.py
# With specific components
python3 run_supervisor.py --components jarvis,prime,reactor
# Development mode (verbose logging)
python3 run_supervisor.py --dev --log-level DEBUGComprehensive documentation for DPO, RLHF, Constitutional AI, Curriculum Learning with code examples for memory management, experience replay, and multi-GPU training.
Production-ready async patterns including circuit breakers, backpressure management, dead letter queues, health monitoring, and adaptive rate limiting.
FastAPI server with telemetry collection, Night Shift scheduling, model registry, health aggregation, and real-time WebSocket streaming.
Zero-downtime model serving with hot-reload, multi-backend support (vLLM, llama.cpp, MLX, Transformers), LRU caching, and semantic response caching.
Multi-repo coordination with heartbeat monitoring, command routing, state reconciliation, dead letter queue, and atomic file I/O.
(See full documentation in sections below)
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β JARVIS AGI ECOSYSTEM β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ€
β β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β JARVIS-AI-Agent βββββββββββΊβ JARVIS Prime β β
β β (Claude Body) β Events β (LLM Mind) β β
β β β β β β
β β β’ Computer Use β β β’ Local LLM β β
β β β’ macOS Control β β β’ Reasoning β β
β β β’ Voice Auth β β β’ Context β β
β βββββββββββ¬βββββββββ βββββββββββ¬βββββββββ β
β β β β
β β Event Bridge β β
β β (WebSocket/Redis) β β
β β β β
β βββββββββββΌβββββββββββββββββββββββββββββββΌβββββββββ β
β β Reactor Core (Nervous System) β β
β β ββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Trinity Orchestrator β β β
β β β β’ Heartbeat monitoring β β β
β β β β’ Command routing β β β
β β β β’ State reconciliation β β β
β β ββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Training & Serving β β β
β β β β’ DPO, RLHF, Constitutional AI β β β
β β β β’ Hot-reload model server β β β
β β β β’ Night Shift scheduler β β β
β β ββββββββββββββββββββββββββββββββββββββββββββ β β
β β β β
β β ββββββββββββββββββββββββββββββββββββββββββββ β β
β β β Event Streaming β β β
β β β β’ Safety audit trail β β β
β β β β’ Cost tracking β β β
β β β β’ Telemetry collection β β β
β β ββββββββββββββββββββββββββββββββββββββββββββ β β
β ββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β
β βΌ βΌ β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β Cloud SQL β β GCP Storage β β
β β (Events DB) β β (Checkpoints) β β
β ββββββββββββββββββββ ββββββββββββββββββββ β
β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
- Hot-reload model server with zero-downtime updates (1,545 lines)
- Multi-backend inference engine: vLLM, llama.cpp, MLX, Transformers (1,891 lines)
- Unified supervisor for one-command AGI OS startup (1,635 lines)
- LRU model cache with memory-aware eviction
- Priority request queue for SLA compliance
- Semantic response caching with hash-based deduplication
- Telemetry collection system with WebSocket streaming (1,128 lines)
- Night Shift scheduler for automated training (1,030 lines)
- Model registry with versioning and A/B testing (1,301 lines)
- Health aggregator with multi-service dashboard (999 lines)
- Enhanced FastAPI server (2,252 lines)
- Advanced async patterns library (1,746 lines)
- Circuit breaker, backpressure, bulkhead patterns
- Dead letter queue, health monitor, adaptive rate limiter
- Dependency injection system (913 lines)
- DPO, RLHF, Constitutional AI, Curriculum Learning (2,899 lines)
- Memory manager with dynamic batch sizing
- Advanced evaluation suite (1,536 lines)
- DLQ for failed/expired commands
- Automatic retry with exponential backoff
- Zero-corruption file operations via atomic renames
- Safety audit trail and kill switch mechanism
- Real-time event streaming across JARVIS ecosystem
- PyTorch-first ML training framework
- LoRA/QLoRA, DPO, FSDP support
- GCP Spot VM resilience
- GitHub: https://github.com/drussell23/JARVIS-Reactor
- MLForge C++ Core: https://github.com/drussell23/MLForge
- JARVIS-AI-Agent: https://github.com/drussell23/JARVIS-AI-Agent
- JARVIS Prime: https://github.com/drussell23/jarvis-prime
MIT License - See LICENSE file for details.
Built with β€οΈ for the JARVIS AGI Ecosystem