Avano - Persian Multi-Speaker Voice Transcription Service

Avano is a powerful Persian speech-to-text service designed for multi-speaker transcription in a single audio session.

آوانو یک سرویس قدرتمند تبدیل صوت به متن فارسی است که برای پیاده‌سازی متن گفتارِ چند سخنران در یک جلسه صوتی طراحی شده است.

Model Information

Avano uses the state-of-the-art vhdm/whisper-large-fa-v1 model, which is specifically fine-tuned for Persian speech recognition. The model achieves a Word Error Rate (WER) of 14.07% on clean Persian speech data.

Key Features of the Model

🎯 Fine-tuned on high-quality Persian speech data
🚀 Based on OpenAI's Whisper Large V3 Turbo architecture
📊 14.07% Word Error Rate (WER)
💪 Optimized for Persian voice transcription

Installation Guide

Prerequisites

Python 3.10 or higher
CUDA-compatible GPU (recommended)
Docker and Docker Compose (optional)

Option 1: Using Docker (Recommended)

Clone the repository:

git clone https://github.com/ma14ch/avano.git
cd avano

Start the service using Docker Compose:

docker-compose up --build

The service will be available at http://localhost:5016.

Option 2: Manual Installation

Clone the repository:

git clone https://github.com/ma14ch/avano.git
cd avano

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use: venv\Scripts\activate

Install dependencies:

pip install -r requirements.txt

Run the service:

python src/main.py

The service will be available at http://localhost:5016.

Environment Configuration

The service automatically detects GPU availability
Default port is 5016 (can be modified in main.py)
Model files are stored in the models/ directory

API Usage Examples with `curl`

Basic API Status Check

Check if the API is running:

curl -X GET http://localhost:5016/

Speech-to-Text Transcription with Speaker Diarization

Send an audio file for transcription:

curl -X POST http://localhost:5016/api/inference/ \
  -F "audio_file=@/path/to/your/audio/file.mp3" \
  -F "num_speakers=2"

Parameters

audio_file: The audio file to transcribe (required)
num_speakers: Number of speakers to identify (optional)

Check Model Status

Check if the models are loaded correctly:

curl -X GET http://localhost:5016/debug/models

Response Format

The API returns a JSON response with transcribed segments:

{
  "segments": [
    {
      "speaker": "SPEAKER_0",
      "start": 0.5,
      "end": 5.2,
      "transcription": "متن تبدیل‌شده برای گوینده اول"
    },
    {
      "speaker": "SPEAKER_1",
      "start": 5.8,
      "end": 10.3,
      "transcription": "متن تبدیل‌شده برای گوینده دوم"
    }
  ]
}

Model Limitations

Optimized for clean audio quality
Not designed for real-time streaming ASR
May occasionally produce hallucinations (a common limitation in Whisper models)
Best performance on standard Persian speech, may have reduced accuracy with heavy accents or dialects

License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
src		src
ui		ui
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Avano - Persian Multi-Speaker Voice Transcription Service

Avano is a powerful Persian speech-to-text service designed for multi-speaker transcription in a single audio session.

آوانو یک سرویس قدرتمند تبدیل صوت به متن فارسی است که برای پیاده‌سازی متن گفتارِ چند سخنران در یک جلسه صوتی طراحی شده است.

Model Information

Key Features of the Model

Installation Guide

Prerequisites

Option 1: Using Docker (Recommended)

Option 2: Manual Installation

Environment Configuration

API Usage Examples with `curl`

Basic API Status Check

Speech-to-Text Transcription with Speaker Diarization

Parameters

Check Model Status

Response Format

Model Limitations

License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

ma14ch/Avano

Folders and files

Latest commit

History

Repository files navigation

Avano - Persian Multi-Speaker Voice Transcription Service

Avano is a powerful Persian speech-to-text service designed for multi-speaker transcription in a single audio session.

آوانو یک سرویس قدرتمند تبدیل صوت به متن فارسی است که برای پیاده‌سازی متن گفتارِ چند سخنران در یک جلسه صوتی طراحی شده است.

Model Information

Key Features of the Model

Installation Guide

Prerequisites

Option 1: Using Docker (Recommended)

Option 2: Manual Installation

Environment Configuration

API Usage Examples with curl

Basic API Status Check

Speech-to-Text Transcription with Speaker Diarization

Parameters

Check Model Status

Response Format

Model Limitations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

API Usage Examples with `curl`

Packages