Dulcet

A privacy-first, local voice-enabled AI assistant SDK. Connect to Claude, Gemini, or ChatGPT using entirely local, open-source components for speech processing.

Features

Privacy-first: Audio never leaves your device for STT/TTS processing
Low latency: Sub-second response initiation for natural conversation flow
Model-agnostic: Voice layer decoupled from LLM backend selection
Simple deployment: Minimal dependencies, runs on consumer hardware

Packages

Package	Description
`dulcet`	Python SDK with FastAPI WebSocket server
`@dulcet/client`	TypeScript browser client

Quick Start

Using Docker (recommended)

Docker provides a complete development environment with all dependencies pre-installed:

# Start the server (includes FFmpeg, models, everything)
docker compose up server

# Run Python tests
docker compose run --rm python-test

# Run client tests
docker compose run --rm client-test

# Open a Python dev shell
docker compose run --rm python-dev

# Open a client dev shell (watch mode)
docker compose run --rm client-dev

Configure API keys:

cp .env.example .env
# Edit .env with your API keys
docker compose up server

Manual Setup

Prerequisites

Install FFmpeg (required for speech processing):

# macOS
brew install ffmpeg

# Ubuntu/Debian
sudo apt-get install ffmpeg libavcodec-dev libavformat-dev libavdevice-dev

Python Server

Using uv (recommended):

uv pip install "dulcet[speech]"

Or with pip:

pip install "dulcet[speech]"

Download the speech models (~500MB):

dulcet download

Start the server:

from dulcet import VoicePipeline, run_server

pipeline = VoicePipeline()
run_server(pipeline)

Or via CLI:

dulcet serve --provider claude

Browser Client

npm install @dulcet/client

import { DulcetClient } from "@dulcet/client";

const client = new DulcetClient({ url: "ws://localhost:8000/ws" });

client.on("transcript", ({ text }) => console.log("You:", text));
client.on("response", ({ text }) => console.log("Assistant:", text));

await client.connect();
await client.startListening();

Client Configuration

const client = new DulcetClient({
  url: "ws://localhost:8000/ws",

  // LLM settings (optional, can also be set server-side)
  provider: "claude",           // "claude" | "gemini" | "openai"
  model: "claude-sonnet-4-20250514",
  systemPrompt: "You are a helpful assistant.",

  // TTS voice
  voice: "en_US-lessac-medium",

  // Audio settings
  sampleRate: 16000,            // Default: 16000

  // Reconnection settings
  autoReconnect: true,          // Default: true
  maxReconnectAttempts: 5,      // Default: 5
  reconnectDelay: 1000,         // Default: 1000ms
  maxReconnectDelay: 30000,     // Default: 30000ms

  // Debug mode
  debug: false,                 // Default: false
});

Client Methods

await client.connect();         // Connect to server
client.disconnect();            // Disconnect from server

await client.startListening();  // Start microphone capture
client.stopListening();         // Stop microphone capture

client.sendText("Hello");       // Send text directly (bypass STT)
client.interrupt();             // Stop current TTS playback

client.configure({              // Update settings at runtime
  provider: "openai",
  model: "gpt-4o",
  voice: "en_GB-alba-medium",
  systemPrompt: "New prompt",
});

// Properties
client.isConnected;             // boolean
client.isListening;             // boolean
client.status;                  // "disconnected" | "connecting" | "connected" | "reconnecting"

Client Events

client.on("connected", () => {});
client.on("disconnected", (reason) => {});
client.on("reconnecting", (attempt) => {});

client.on("transcript", ({ text, isFinal }) => {});  // User speech transcription
client.on("response", ({ text, isFinal }) => {});    // LLM response text

client.on("status", ({ state }) => {});              // "listening" | "processing" | "speaking"
client.on("audioStart", () => {});                   // TTS playback started
client.on("audioEnd", () => {});                     // TTS playback ended

client.on("error", ({ message, code }) => {});

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        Frontend (Browser)                       │
│                                                                 │
│   ┌─────────┐      ┌─────────┐      ┌─────────────────────┐     │
│   │   Mic   │ ───▶ │   VAD   │ ───▶ │  WebSocket Client   │     │
│   └─────────┘      └─────────┘      └──────────┬──────────┘     │
│                                                │                │
│   ┌─────────┐                                  │                │
│   │ Speaker │ ◀────────────────────────────────┘                │
│   └─────────┘                                                   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘
                                 │
                                 │ WebSocket
                                 ▼
┌─────────────────────────────────────────────────────────────────┐
│                     Backend (Python/FastAPI)                    │
│                                                                 │
│   ┌───────────────┐    ┌───────────────┐    ┌──────────────┐    │
│   │ faster-whisper│───▶│  LLM Router   │───▶│    Piper     │    │
│   │     (STT)     │    │               │    │    (TTS)     │    │
│   └───────────────┘    └───────┬───────┘    └──────────────┘    │
│                                │                                │
│                    ┌───────────┼───────────┐                    │
│                    ▼           ▼           ▼                    │
│                 Claude     Gemini      ChatGPT                  │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

CLI Commands

# Download speech models (required before first run)
dulcet download
dulcet download --stt-model large-v3    # Use larger Whisper model
dulcet download --tts-voice en_GB-alba-medium  # Different voice

# Start the server
dulcet serve
dulcet serve --provider openai --port 3000
dulcet serve --reload  # Auto-reload for development

# Validate API keys
dulcet validate

Components

STT: faster-whisper - 4x faster Whisper with CTranslate2
TTS: Piper - Fast, local neural TTS
VAD: Silero VAD - Voice activity detection

Requirements

Minimum (CPU-only)

4-core CPU (Intel i5 / AMD Ryzen 5 or better)
8 GB RAM
2 GB disk space for models

Recommended (GPU)

NVIDIA GPU with 6+ GB VRAM (RTX 3060 or better)
16 GB RAM
4 GB disk space for models

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.github/workflows		.github/workflows
examples		examples
packages		packages
.env.example		.env.example
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
SECURITY.md		SECURITY.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Dulcet

Features

Packages

Quick Start

Using Docker (recommended)

Manual Setup

Prerequisites

Python Server

Browser Client

Client Configuration

Client Methods

Client Events

Architecture

CLI Commands

Components

Requirements

Minimum (CPU-only)

Recommended (GPU)

License

About

Uh oh!

Releases

Packages

Languages

License

HartBrook/dulcet

Folders and files

Latest commit

History

Repository files navigation

Dulcet

Features

Packages

Quick Start

Using Docker (recommended)

Manual Setup

Prerequisites

Python Server

Browser Client

Client Configuration

Client Methods

Client Events

Architecture

CLI Commands

Components

Requirements

Minimum (CPU-only)

Recommended (GPU)

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages