Skip to content

xT10r/dialogos

Repository files navigation

Dialogos: Real-time Russian ASR System

Dialogos is a real-time Automatic Speech Recognition (ASR) system for Russian language with low latency and minimal VRAM usage. It features a Docker-based backend running the T-one ASR engine (streaming WebSocket + endpointing) and a Windows client that captures audio via WASAPI and streams it over WebSocket.

Features

  • Low Latency: Real-time streaming ASR with minimal delay
  • Lightweight: Optimized for low VRAM consumption
  • Multi-client: Supports multiple simultaneous connections
  • Endpointing: Automatic phrase detection and segmentation
  • GPU Accelerated: Leverages NVIDIA GPU for fast inference

Documentation

For detailed documentation, please see:

Quick Start

Prerequisites

  • Docker installed on your system
  • Go 1.19+ for building the client
  • Audio input device (microphone)

Building and Running

  1. Build the client:

    make build-client
  2. Run the server:

    docker-compose up -d
  3. Run the client:

    make run-client

Project Structure

  • asr-client/ - Windows client application for audio capture and streaming
  • asr-server/ - WebSocket server implementation for speech recognition
  • docs/ - Project documentation
  • Dockerfile - Server Docker image definition
  • docker-compose.yml - Multi-container deployment configuration

License

This project is licensed under the Apache License, Version 2.0. See LICENSE for details.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Packages

No packages published