LLM Autotuner (for inference)

Automated parameter tuning for LLM inference engines (SGLang, vLLM) for best performance, while respecting SLOs and hardware constraints.

Why Autotuner?

Quantization and parameter tuning can unlock 60%+ performance gains. LLM inference engines like SGLang and vLLM ship with conservative defaults that work everywhere but are optimized for nowhere.

Performance Impact: Real-World Data

Testing on NVIDIA RTX 4090 (24GB) with typical production workloads (mixed prefill/decode).

See detailed benchmarks: Baseline Benchmarks

What You Get	Manual Tuning	Autotuner
Time to optimal config	Hours to Days	Minutes
Parameter combinations tested	~10 (limited by patience)	50-100+ (automated)
Performance gain	Unknown (untested)	60%+ throughput (quantization + tuning)
Reproducibility	Low (manual errors)	High (versioned configs)
Cross-hardware portability	Manual rework	Re-run task (one command)

How to Use

CLI Mode

Web UI Mode

Agent Mode

Core Concepts

Task: A tuning job containing model config, parameter ranges, SLOs, and optimization strategy
Experiment: Individual trial with specific parameter values; multiple experiments per task
ARQ Worker: Background processor that deploys models, runs benchmarks, and scores results

Features

Multiple Deployment Modes: Docker, Local (direct GPU), OME (Kubernetes)
Web UI: React frontend with real-time monitoring
Agent Assistant: LLM-powered assistant for task management and troubleshooting
Optimization Strategies: Grid search, Bayesian optimization
SLO-Aware Scoring: Exponential penalties for constraint violations

Quick Start

→ Get started in 5 minutes with Docker

# Install
pip install -r requirements.txt && pip install genai-bench

# Run
python src/run_autotuner.py examples/docker_task.yaml --mode docker

Web UI

# Start backend + worker
./scripts/start_dev.sh

# Start frontend (separate terminal)
cd frontend && npm run dev

Access at http://localhost:5173

Documentation

Full Documentation

Project Overview

ROADMAP.md - Product roadmap with completed milestones and future plans

Setup & Deployment

Installation Guide - Complete installation guide
Quick Start - Quick start tutorial
Docker Mode - Docker deployment guide
Kubernetes/OME - Kubernetes/OME setup

Features & Configuration

SLO Scoring - SLO-aware scoring with exponential penalties
Parallel Execution - Parallel experiment execution
WebSocket Implementation - Real-time updates via WebSocket
Quantization Parameters - Quantization configuration
Parameter Presets - Parameter preset system
Bayesian Optimization - Bayesian optimization strategy
GPU Tracking - GPU intelligent scheduling

Operations & Troubleshooting

Troubleshooting - Common issues and solutions

Contributing

See DEVELOPMENT for development guidelines and project architecture.

Name		Name	Last commit message	Last commit date
Latest commit History 380 Commits
.github/workflows		.github/workflows
docs		docs
examples		examples
frontend		frontend
scripts		scripts
src		src
tests		tests
third_party		third_party
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.gitmodules		.gitmodules
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-entrypoint.sh		docker-entrypoint.sh
install.sh		install.sh
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

LLM Autotuner (for inference)

Why Autotuner?

Performance Impact: Real-World Data

How to Use

CLI Mode

Web UI Mode

Agent Mode

Core Concepts

Features

Quick Start

Web UI

Documentation

Project Overview

Setup & Deployment

Features & Configuration

Operations & Troubleshooting

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

novitalabs/autotuner

Folders and files

Latest commit

History

Repository files navigation

LLM Autotuner (for inference)

Why Autotuner?

Performance Impact: Real-World Data

How to Use

CLI Mode

Web UI Mode

Agent Mode

Core Concepts

Features

Quick Start

Web UI

Documentation

Project Overview

Setup & Deployment

Features & Configuration

Operations & Troubleshooting

Contributing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages