A privacy-first, local voice-enabled AI assistant SDK. Connect to Claude, Gemini, or ChatGPT using entirely local, open-source components for speech processing.
- Privacy-first: Audio never leaves your device for STT/TTS processing
- Low latency: Sub-second response initiation for natural conversation flow
- Model-agnostic: Voice layer decoupled from LLM backend selection
- Simple deployment: Minimal dependencies, runs on consumer hardware
| Package | Description |
|---|---|
dulcet |
Python SDK with FastAPI WebSocket server |
@dulcet/client |
TypeScript browser client |
Docker provides a complete development environment with all dependencies pre-installed:
# Start the server (includes FFmpeg, models, everything)
docker compose up server
# Run Python tests
docker compose run --rm python-test
# Run client tests
docker compose run --rm client-test
# Open a Python dev shell
docker compose run --rm python-dev
# Open a client dev shell (watch mode)
docker compose run --rm client-devConfigure API keys:
cp .env.example .env
# Edit .env with your API keys
docker compose up serverInstall FFmpeg (required for speech processing):
# macOS
brew install ffmpeg
# Ubuntu/Debian
sudo apt-get install ffmpeg libavcodec-dev libavformat-dev libavdevice-devUsing uv (recommended):
uv pip install "dulcet[speech]"Or with pip:
pip install "dulcet[speech]"Download the speech models (~500MB):
dulcet downloadStart the server:
from dulcet import VoicePipeline, run_server
pipeline = VoicePipeline()
run_server(pipeline)Or via CLI:
dulcet serve --provider claudenpm install @dulcet/clientimport { DulcetClient } from "@dulcet/client";
const client = new DulcetClient({ url: "ws://localhost:8000/ws" });
client.on("transcript", ({ text }) => console.log("You:", text));
client.on("response", ({ text }) => console.log("Assistant:", text));
await client.connect();
await client.startListening();const client = new DulcetClient({
url: "ws://localhost:8000/ws",
// LLM settings (optional, can also be set server-side)
provider: "claude", // "claude" | "gemini" | "openai"
model: "claude-sonnet-4-20250514",
systemPrompt: "You are a helpful assistant.",
// TTS voice
voice: "en_US-lessac-medium",
// Audio settings
sampleRate: 16000, // Default: 16000
// Reconnection settings
autoReconnect: true, // Default: true
maxReconnectAttempts: 5, // Default: 5
reconnectDelay: 1000, // Default: 1000ms
maxReconnectDelay: 30000, // Default: 30000ms
// Debug mode
debug: false, // Default: false
});await client.connect(); // Connect to server
client.disconnect(); // Disconnect from server
await client.startListening(); // Start microphone capture
client.stopListening(); // Stop microphone capture
client.sendText("Hello"); // Send text directly (bypass STT)
client.interrupt(); // Stop current TTS playback
client.configure({ // Update settings at runtime
provider: "openai",
model: "gpt-4o",
voice: "en_GB-alba-medium",
systemPrompt: "New prompt",
});
// Properties
client.isConnected; // boolean
client.isListening; // boolean
client.status; // "disconnected" | "connecting" | "connected" | "reconnecting"client.on("connected", () => {});
client.on("disconnected", (reason) => {});
client.on("reconnecting", (attempt) => {});
client.on("transcript", ({ text, isFinal }) => {}); // User speech transcription
client.on("response", ({ text, isFinal }) => {}); // LLM response text
client.on("status", ({ state }) => {}); // "listening" | "processing" | "speaking"
client.on("audioStart", () => {}); // TTS playback started
client.on("audioEnd", () => {}); // TTS playback ended
client.on("error", ({ message, code }) => {});┌─────────────────────────────────────────────────────────────────┐
│ Frontend (Browser) │
│ │
│ ┌─────────┐ ┌─────────┐ ┌─────────────────────┐ │
│ │ Mic │ ───▶ │ VAD │ ───▶ │ WebSocket Client │ │
│ └─────────┘ └─────────┘ └──────────┬──────────┘ │
│ │ │
│ ┌─────────┐ │ │
│ │ Speaker │ ◀────────────────────────────────┘ │
│ └─────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
│
│ WebSocket
▼
┌─────────────────────────────────────────────────────────────────┐
│ Backend (Python/FastAPI) │
│ │
│ ┌───────────────┐ ┌───────────────┐ ┌──────────────┐ │
│ │ faster-whisper│───▶│ LLM Router │───▶│ Piper │ │
│ │ (STT) │ │ │ │ (TTS) │ │
│ └───────────────┘ └───────┬───────┘ └──────────────┘ │
│ │ │
│ ┌───────────┼───────────┐ │
│ ▼ ▼ ▼ │
│ Claude Gemini ChatGPT │
│ │
└─────────────────────────────────────────────────────────────────┘
# Download speech models (required before first run)
dulcet download
dulcet download --stt-model large-v3 # Use larger Whisper model
dulcet download --tts-voice en_GB-alba-medium # Different voice
# Start the server
dulcet serve
dulcet serve --provider openai --port 3000
dulcet serve --reload # Auto-reload for development
# Validate API keys
dulcet validate- STT: faster-whisper - 4x faster Whisper with CTranslate2
- TTS: Piper - Fast, local neural TTS
- VAD: Silero VAD - Voice activity detection
- 4-core CPU (Intel i5 / AMD Ryzen 5 or better)
- 8 GB RAM
- 2 GB disk space for models
- NVIDIA GPU with 6+ GB VRAM (RTX 3060 or better)
- 16 GB RAM
- 4 GB disk space for models
MIT