Sonori

Local AI speech transcription with a transparent overlay for Linux

Real-time or on-demand transcription, entirely on your device.

Note: Active development. You may encounter bugs or instability as new features are added.

Features

Core

Local AI Processing - All transcription happens on your device, no cloud services required
Multi-Backend Support - Choose between CTranslate2, Whisper.cpp, or Moonshine backends
Dual Transcription Modes - Real-time continuous transcription or manual on-demand sessions
Voice Activity Detection - Uses Silero VAD for accurate speech detection
Automatic Model Download - Models are downloaded automatically on first run

Interface

Transparent Overlay - Non-intrusive overlay at the bottom of your screen
CLI Mode - Run without GUI using --cli flag for headless/terminal usage
Audio Visualization - Spectrogram display shows audio input in real-time
System Tray Integration - Quick access with window control and status display
Typewriter Effect - Character-by-character text reveal animation when transcription completes

Optional Features

GPU Acceleration - Vulkan-based acceleration (Whisper.cpp backend only)
Global Shortcuts - System-wide hotkeys via XDG Desktop Portal (e.g., Super+\ to toggle recording)
Auto-Paste - Automatic text injection via XDG Desktop Portal, with wtype/dotool fallback for compositors without portal support
Sound Feedback - Audio cues for recording state changes
Magic Mode - Post-process transcriptions through a local LLM to clean up grammar, remove filler words, and improve readability

Roadmap

Planned:

Better error handling and UI improvements
CUDA support for GPU acceleration
Additional local AI backends
Optional cloud API support (Deepgram, OpenAI)

Not Planned:

GUI framework (custom wgpu/wgsl implementation by design)
Windows/macOS support (contributions welcome)

System Requirements

Platform: Linux x86_64 only

Tested on: NixOS with KDE Plasma/KWin and niri (Wayland)

Compositor (Wayland)

Protocol	Required	Purpose
`zwlr_layer_shell_v1`	Yes	Transparent overlay rendering
XDG Portal: GlobalShortcuts	No	System-wide hotkeys
XDG Portal: RemoteDesktop	No	Auto-paste via portal (fallback: wtype/dotool)

Compositor Compatibility:

Compositor	Status
KDE Plasma (KWin)	✅ Full support
niri	✅ Full support (use IPC for keybindings)
Hyprland	✅ Should work
Sway	✅ Should work
GNOME (Mutter)	❌ No layer shell (use CLI mode)

Hardware

GPU: Vulkan-capable with appropriate drivers
Audio: Working microphone, PipeWire or PulseAudio

Installation

AppImage (Recommended)

# Download from GitHub Releases
chmod +x Sonori-*-x86_64.AppImage
./Sonori-*-x86_64.AppImage

Release Tarball

tar -xzf sonori-*-x86_64-linux.tar.gz
./sonori-*/sonori

NixOS

# Try without installing
nix run github:0xPD33/sonori

# Install to profile
nix profile install github:0xPD33/sonori

Or add to your flake:

{
  inputs.sonori.url = "github:0xPD33/sonori";
  # Then add: inputs.sonori.packages.${system}.default
}

Building from Source

Prerequisites: Rust and distribution-specific dependencies.

Ubuntu/Debian 24.04+

# Install system dependencies
sudo apt-get update
sudo apt-get install -y build-essential portaudio19-dev libclang-dev pkg-config \
  libxkbcommon-dev libwayland-dev libx11-dev libxcursor-dev libxi-dev libxrandr-dev \
  libasound2-dev libssl-dev libfftw3-dev curl cmake libvulkan-dev libopenblas-dev glslc

# Install ONNX Runtime (not in repos)
ONNX_VERSION=1.22.0
wget https://github.com/microsoft/onnxruntime/releases/download/v${ONNX_VERSION}/onnxruntime-linux-x64-${ONNX_VERSION}.tgz
tar -xzf onnxruntime-linux-x64-${ONNX_VERSION}.tgz
sudo cp -r onnxruntime-linux-x64-${ONNX_VERSION}/include/* /usr/local/include/
sudo cp -r onnxruntime-linux-x64-${ONNX_VERSION}/lib/* /usr/local/lib/
sudo mkdir -p /usr/local/lib64
sudo cp -r onnxruntime-linux-x64-${ONNX_VERSION}/lib/* /usr/local/lib64/
echo "/usr/local/lib" | sudo tee /etc/ld.so.conf.d/onnxruntime.conf
echo "/usr/local/lib64" | sudo tee -a /etc/ld.so.conf.d/onnxruntime.conf
sudo ldconfig

Set environment variables before building:

export BLAS_INCLUDE_DIRS=/usr/include/x86_64-linux-gnu
export OPENBLAS_PATH=/usr
export ORT_STRATEGY=system
export ORT_LIB_LOCATION=/usr/local/lib

Fedora/RHEL

sudo dnf install gcc gcc-c++ portaudio-devel clang-devel pkg-config \
  libxkbcommon-devel wayland-devel libX11-devel libXcursor-devel libXi-devel libXrandr-devel \
  alsa-lib-devel openssl-devel fftw-devel curl cmake vulkan-loader-devel vulkan-headers \
  openblas-devel shaderc onnxruntime-devel

Set environment variables before building:

export BLAS_INCLUDE_DIRS=/usr/include/openblas
export OPENBLAS_PATH=/usr
export ORT_STRATEGY=system

Arch/Manjaro

sudo pacman -S base-devel portaudio clang pkgconf \
  libxkbcommon wayland libx11 libxcursor libxi libxrandr alsa-lib openssl fftw curl cmake \
  vulkan-headers vulkan-tools openblas shaderc
# Install onnxruntime from AUR (e.g., yay -S onnxruntime)

Set environment variables before building:

export BLAS_INCLUDE_DIRS=/usr/include/openblas
export OPENBLAS_PATH=/usr
export ORT_STRATEGY=system

NixOS

nix develop  # All dependencies included

Build:

git clone https://github.com/0xPD33/sonori
cd sonori
# Ensure environment variables are set (see distro-specific instructions above)
cargo build --release
./target/release/sonori

Desktop Integration

NixOS: Automatic via Nix flake.

Other distributions:

./install-desktop.sh --user        # User installation (recommended)
sudo ./install-desktop.sh --system # System-wide installation

See desktop/README.md for details.

Usage

GUI Mode (Default)

sonori

A transparent overlay appears at the bottom of your screen
Real-time mode: Recording starts automatically
Manual mode: Press Record to start/stop sessions
Use overlay buttons to copy text, clear history, switch modes, or exit

CLI Mode

sonori --cli

Transcription appears directly in terminal
Real-time mode: auto-starts recording
Manual mode: use spacebar to start/stop
Ctrl+C to exit

Command Line Options

Option	Description
`--cli`	Run in CLI mode without GUI
`--mode <realtime\|manual>`	Set transcription mode (default: manual)
`--manual`	Shorthand for `--mode manual`
`--help`	Show help information
`--version`	Display version

IPC Commands (External Control)

Control a running Sonori instance via CLI subcommands. Useful for compositor keybindings on niri, sway, etc. where XDG GlobalShortcuts portal isn't available.

sonori toggle      # Toggle recording on/off
sonori start       # Start recording session
sonori stop        # Stop recording session
sonori cancel      # Cancel session without processing
sonori status      # Get current status (JSON)
sonori switch-mode manual|realtime

Example niri keybinding (~/.config/niri/config.kdl):

binds {
    Mod+backslash { spawn "sonori" "toggle"; }
}

Configuration

Sonori uses config.toml for configuration. Defaults work well for most users.

Quick Setup: Choose a preset from the Configuration Guide:

Fast & Lightweight - Good for older computers
Balanced Performance - Recommended for most users
High Quality - For powerful computers with GPU
Real-Time - Live transcription as you speak
Multilingual - For non-English languages
Moonshine - ONNX-based backend with fast real-time performance

Troubleshooting

Wayland / Layer Shell

Sonori uses zwlr_layer_shell_v1 for the transparent overlay.

Verify Wayland session: echo $XDG_SESSION_TYPE should return wayland
Check Compositor Compatibility table above
GNOME/Mutter doesn't support layer shell - use CLI mode (--cli)

Vulkan / GPU

Required for UI rendering and optional GPU-accelerated transcription.

Install Vulkan libraries: vulkan-loader, vulkan-headers
Vendor-specific packages may be needed (e.g., mesa-vulkan-drivers on Ubuntu)
Test with: vulkaninfo or vkcube
For GPU transcription: enable gpu_enabled = true in [backend_config]

XDG Desktop Portal Features

Global Shortcuts (global_shortcuts_enabled):

Requires KDE Plasma 6+ or GNOME 45+
Accept permission dialog on first run
Check portal is running: systemctl --user status xdg-desktop-portal

Auto-Paste (portal_input_enabled):

Uses XDG RemoteDesktop portal for keyboard injection (KDE Plasma)
Falls back to wtype when portal is unavailable (sway, Hyprland, niri, river, labwc, COSMIC)
Falls back to dotool if wtype also fails (works on all compositors via uinput — requires input group membership)
Copies text to clipboard via wl-copy, then simulates the configured paste shortcut

Model Issues

Automatic conversion fails:

# NixOS
nix-shell model-conversion/shell.nix
ct2-transformers-converter --model your-model --output_dir ~/.cache/whisper/your-model --copy_files preprocessor_config.json tokenizer.json

# Other distros
pip install -U ctranslate2 huggingface_hub torch transformers
ct2-transformers-converter --model your-model --output_dir ~/.cache/whisper/your-model --copy_files preprocessor_config.json tokenizer.json

30-second truncation: Whisper's 30-second window with 448 token limit can truncate dense speech. Solutions:

Keep recordings under 25 seconds
Adjust chunk_duration_seconds (15-25) in [manual_mode_config]
Try CTranslate2 backend

Moonshine model layout: Moonshine uses ONNX merged models (auto-downloaded) and expects a model name like tiny or base. If you see decoder input errors, set [moonshine_options].enable_cache = false and retry.

Known Issues

Not all Wayland compositors supported (tested primarily on KDE Plasma/KWin)
Transcription accuracy depends on Whisper model quality
CPU usage can be high when idle (buffer size related)

Contributing

Contributions welcome! Whether fixing bugs, adding features, improving docs, or testing on different distributions.

Getting Started:

See ARCHITECTURE.md to understand the codebase
Check planned features and known issues above
Test on your distribution
Open an issue or PR

Credits

License

MIT License - see LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 154 Commits
.github/workflows		.github/workflows
assets		assets
desktop		desktop
model-conversion		model-conversion
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
ARCHITECTURE.md		ARCHITECTURE.md
CLAUDE.md		CLAUDE.md
CONFIGURATION.md		CONFIGURATION.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
RELEASING.md		RELEASING.md
cliff.toml		cliff.toml
config.toml		config.toml
flake.lock		flake.lock
flake.nix		flake.nix
install-desktop.sh		install-desktop.sh
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sonori

Features

Core

Interface

Optional Features

Roadmap

System Requirements

Compositor (Wayland)

Hardware

Installation

AppImage (Recommended)

Release Tarball

NixOS

Building from Source

Desktop Integration

Usage

GUI Mode (Default)

CLI Mode

Command Line Options

IPC Commands (External Control)

Configuration

Troubleshooting

Wayland / Layer Shell

Vulkan / GPU

XDG Desktop Portal Features

Model Issues

Known Issues

Contributing

Credits

License

About

Uh oh!

Releases 23

Packages

Uh oh!

Languages

License

0xPD33/sonori

Folders and files

Latest commit

History

Repository files navigation

Sonori

Features

Core

Interface

Optional Features

Roadmap

System Requirements

Compositor (Wayland)

Hardware

Installation

AppImage (Recommended)

Release Tarball

NixOS

Building from Source

Desktop Integration

Usage

GUI Mode (Default)

CLI Mode

Command Line Options

IPC Commands (External Control)

Configuration

Troubleshooting

Wayland / Layer Shell

Vulkan / GPU

XDG Desktop Portal Features

Model Issues

Known Issues

Contributing

Credits

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 23

Packages 0

Uh oh!

Languages

Packages