Local AI speech transcription with a transparent overlay for Linux
Real-time or on-demand transcription, entirely on your device.
Note: Active development. You may encounter bugs or instability as new features are added.
- Local AI Processing - All transcription happens on your device, no cloud services required
- Multi-Backend Support - Choose between CTranslate2, Whisper.cpp, or Moonshine backends
- Dual Transcription Modes - Real-time continuous transcription or manual on-demand sessions
- Voice Activity Detection - Uses Silero VAD for accurate speech detection
- Automatic Model Download - Models are downloaded automatically on first run
- Transparent Overlay - Non-intrusive overlay at the bottom of your screen
- CLI Mode - Run without GUI using
--cliflag for headless/terminal usage - Audio Visualization - Spectrogram display shows audio input in real-time
- System Tray Integration - Quick access with window control and status display
- Typewriter Effect - Character-by-character text reveal animation when transcription completes
- GPU Acceleration - Vulkan-based acceleration (Whisper.cpp backend only)
- Global Shortcuts - System-wide hotkeys via XDG Desktop Portal (e.g., Super+\ to toggle recording)
- Auto-Paste - Automatic text injection via XDG Desktop Portal, with wtype/dotool fallback for compositors without portal support
- Sound Feedback - Audio cues for recording state changes
- Magic Mode - Post-process transcriptions through a local LLM to clean up grammar, remove filler words, and improve readability
Planned:
- Better error handling and UI improvements
- CUDA support for GPU acceleration
- Additional local AI backends
- Optional cloud API support (Deepgram, OpenAI)
Not Planned:
- GUI framework (custom wgpu/wgsl implementation by design)
- Windows/macOS support (contributions welcome)
Platform: Linux x86_64 only
Tested on: NixOS with KDE Plasma/KWin and niri (Wayland)
| Protocol | Required | Purpose |
|---|---|---|
zwlr_layer_shell_v1 |
Yes | Transparent overlay rendering |
| XDG Portal: GlobalShortcuts | No | System-wide hotkeys |
| XDG Portal: RemoteDesktop | No | Auto-paste via portal (fallback: wtype/dotool) |
Compositor Compatibility:
| Compositor | Status |
|---|---|
| KDE Plasma (KWin) | ✅ Full support |
| niri | ✅ Full support (use IPC for keybindings) |
| Hyprland | ✅ Should work |
| Sway | ✅ Should work |
| GNOME (Mutter) | ❌ No layer shell (use CLI mode) |
- GPU: Vulkan-capable with appropriate drivers
- Audio: Working microphone, PipeWire or PulseAudio
# Download from GitHub Releases
chmod +x Sonori-*-x86_64.AppImage
./Sonori-*-x86_64.AppImagetar -xzf sonori-*-x86_64-linux.tar.gz
./sonori-*/sonori# Try without installing
nix run github:0xPD33/sonori
# Install to profile
nix profile install github:0xPD33/sonoriOr add to your flake:
{
inputs.sonori.url = "github:0xPD33/sonori";
# Then add: inputs.sonori.packages.${system}.default
}Prerequisites: Rust and distribution-specific dependencies.
Ubuntu/Debian 24.04+
# Install system dependencies
sudo apt-get update
sudo apt-get install -y build-essential portaudio19-dev libclang-dev pkg-config \
libxkbcommon-dev libwayland-dev libx11-dev libxcursor-dev libxi-dev libxrandr-dev \
libasound2-dev libssl-dev libfftw3-dev curl cmake libvulkan-dev libopenblas-dev glslc
# Install ONNX Runtime (not in repos)
ONNX_VERSION=1.22.0
wget https://github.com/microsoft/onnxruntime/releases/download/v${ONNX_VERSION}/onnxruntime-linux-x64-${ONNX_VERSION}.tgz
tar -xzf onnxruntime-linux-x64-${ONNX_VERSION}.tgz
sudo cp -r onnxruntime-linux-x64-${ONNX_VERSION}/include/* /usr/local/include/
sudo cp -r onnxruntime-linux-x64-${ONNX_VERSION}/lib/* /usr/local/lib/
sudo mkdir -p /usr/local/lib64
sudo cp -r onnxruntime-linux-x64-${ONNX_VERSION}/lib/* /usr/local/lib64/
echo "/usr/local/lib" | sudo tee /etc/ld.so.conf.d/onnxruntime.conf
echo "/usr/local/lib64" | sudo tee -a /etc/ld.so.conf.d/onnxruntime.conf
sudo ldconfigSet environment variables before building:
export BLAS_INCLUDE_DIRS=/usr/include/x86_64-linux-gnu
export OPENBLAS_PATH=/usr
export ORT_STRATEGY=system
export ORT_LIB_LOCATION=/usr/local/libFedora/RHEL
sudo dnf install gcc gcc-c++ portaudio-devel clang-devel pkg-config \
libxkbcommon-devel wayland-devel libX11-devel libXcursor-devel libXi-devel libXrandr-devel \
alsa-lib-devel openssl-devel fftw-devel curl cmake vulkan-loader-devel vulkan-headers \
openblas-devel shaderc onnxruntime-develSet environment variables before building:
export BLAS_INCLUDE_DIRS=/usr/include/openblas
export OPENBLAS_PATH=/usr
export ORT_STRATEGY=systemArch/Manjaro
sudo pacman -S base-devel portaudio clang pkgconf \
libxkbcommon wayland libx11 libxcursor libxi libxrandr alsa-lib openssl fftw curl cmake \
vulkan-headers vulkan-tools openblas shaderc
# Install onnxruntime from AUR (e.g., yay -S onnxruntime)Set environment variables before building:
export BLAS_INCLUDE_DIRS=/usr/include/openblas
export OPENBLAS_PATH=/usr
export ORT_STRATEGY=systemNixOS
nix develop # All dependencies includedBuild:
git clone https://github.com/0xPD33/sonori
cd sonori
# Ensure environment variables are set (see distro-specific instructions above)
cargo build --release
./target/release/sonoriNixOS: Automatic via Nix flake.
Other distributions:
./install-desktop.sh --user # User installation (recommended)
sudo ./install-desktop.sh --system # System-wide installationSee desktop/README.md for details.
sonori- A transparent overlay appears at the bottom of your screen
- Real-time mode: Recording starts automatically
- Manual mode: Press Record to start/stop sessions
- Use overlay buttons to copy text, clear history, switch modes, or exit
sonori --cli- Transcription appears directly in terminal
- Real-time mode: auto-starts recording
- Manual mode: use spacebar to start/stop
Ctrl+Cto exit
| Option | Description |
|---|---|
--cli |
Run in CLI mode without GUI |
--mode <realtime|manual> |
Set transcription mode (default: manual) |
--manual |
Shorthand for --mode manual |
--help |
Show help information |
--version |
Display version |
Control a running Sonori instance via CLI subcommands. Useful for compositor keybindings on niri, sway, etc. where XDG GlobalShortcuts portal isn't available.
sonori toggle # Toggle recording on/off
sonori start # Start recording session
sonori stop # Stop recording session
sonori cancel # Cancel session without processing
sonori status # Get current status (JSON)
sonori switch-mode manual|realtimeExample niri keybinding (~/.config/niri/config.kdl):
binds {
Mod+backslash { spawn "sonori" "toggle"; }
}Sonori uses config.toml for configuration. Defaults work well for most users.
Quick Setup: Choose a preset from the Configuration Guide:
- Fast & Lightweight - Good for older computers
- Balanced Performance - Recommended for most users
- High Quality - For powerful computers with GPU
- Real-Time - Live transcription as you speak
- Multilingual - For non-English languages
- Moonshine - ONNX-based backend with fast real-time performance
Sonori uses zwlr_layer_shell_v1 for the transparent overlay.
- Verify Wayland session:
echo $XDG_SESSION_TYPEshould returnwayland - Check Compositor Compatibility table above
- GNOME/Mutter doesn't support layer shell - use CLI mode (
--cli)
Required for UI rendering and optional GPU-accelerated transcription.
- Install Vulkan libraries:
vulkan-loader,vulkan-headers - Vendor-specific packages may be needed (e.g.,
mesa-vulkan-driverson Ubuntu) - Test with:
vulkaninfoorvkcube - For GPU transcription: enable
gpu_enabled = truein[backend_config]
Global Shortcuts (global_shortcuts_enabled):
- Requires KDE Plasma 6+ or GNOME 45+
- Accept permission dialog on first run
- Check portal is running:
systemctl --user status xdg-desktop-portal
Auto-Paste (portal_input_enabled):
- Uses XDG RemoteDesktop portal for keyboard injection (KDE Plasma)
- Falls back to
wtypewhen portal is unavailable (sway, Hyprland, niri, river, labwc, COSMIC) - Falls back to
dotoolif wtype also fails (works on all compositors via uinput — requiresinputgroup membership) - Copies text to clipboard via
wl-copy, then simulates the configured paste shortcut
Automatic conversion fails:
# NixOS
nix-shell model-conversion/shell.nix
ct2-transformers-converter --model your-model --output_dir ~/.cache/whisper/your-model --copy_files preprocessor_config.json tokenizer.json
# Other distros
pip install -U ctranslate2 huggingface_hub torch transformers
ct2-transformers-converter --model your-model --output_dir ~/.cache/whisper/your-model --copy_files preprocessor_config.json tokenizer.json30-second truncation: Whisper's 30-second window with 448 token limit can truncate dense speech. Solutions:
- Keep recordings under 25 seconds
- Adjust
chunk_duration_seconds(15-25) in[manual_mode_config] - Try CTranslate2 backend
Moonshine model layout: Moonshine uses ONNX merged models (auto-downloaded) and expects a model name like tiny or base. If you see decoder input errors, set [moonshine_options].enable_cache = false and retry.
- Not all Wayland compositors supported (tested primarily on KDE Plasma/KWin)
- Transcription accuracy depends on Whisper model quality
- CPU usage can be high when idle (buffer size related)
Contributions welcome! Whether fixing bugs, adding features, improving docs, or testing on different distributions.
Getting Started:
- See ARCHITECTURE.md to understand the codebase
- Check planned features and known issues above
- Test on your distribution
- Open an issue or PR
- Rust
- CTranslate2 / Faster Whisper
- whisper.cpp / whisper-rs
- ONNX Runtime
- OpenAI Whisper
- Moonshine
- Silero VAD
- CPAL
- Winit Fork
- WGPU
MIT License - see LICENSE for details.
