Dialogos is a real-time Automatic Speech Recognition (ASR) system for Russian language with low latency and minimal VRAM usage. It features a Docker-based backend running the T-one ASR engine (streaming WebSocket + endpointing) and a Windows client that captures audio via WASAPI and streams it over WebSocket.
- Low Latency: Real-time streaming ASR with minimal delay
- Lightweight: Optimized for low VRAM consumption
- Multi-client: Supports multiple simultaneous connections
- Endpointing: Automatic phrase detection and segmentation
- GPU Accelerated: Leverages NVIDIA GPU for fast inference
For detailed documentation, please see:
- Docker installed on your system
- Go 1.19+ for building the client
- Audio input device (microphone)
-
Build the client:
make build-client
-
Run the server:
docker-compose up -d
-
Run the client:
make run-client
asr-client/- Windows client application for audio capture and streamingasr-server/- WebSocket server implementation for speech recognitiondocs/- Project documentationDockerfile- Server Docker image definitiondocker-compose.yml- Multi-container deployment configuration
This project is licensed under the Apache License, Version 2.0. See LICENSE for details.