Skip to content

DavidTbilisi/TTS

Repository files navigation

TTS_ka 🚀 Ultra-Fast Text-to-Speech

Ultra-Fast Text-to-Speech CLI tool with maximum speed generation, smart chunking, and parallel processing. Auto-optimized by default - no complex flags needed! Converts text to high-quality speech in Georgian (🇬🇪), Russian (🇷🇺), and English (🇬🇧) languages.

Simplified UX: Auto-optimization is now enabled by default. Just specify --lang and go!

Python 3.6+ MIT License

✨ Features

  • 🚀 Ultra-Fast Generation: 6-15 seconds for 1000 words (vs 25+ seconds traditional)
  • 🔊 Streaming Playback: Audio starts playing while still generating (NEW!)
  • 🧠 Smart Chunking: Automatic text splitting for optimal performance
  • Parallel Processing: Multi-threaded generation with up to 8 workers
  • 📋 Clipboard Integration: Direct clipboard-to-speech workflow
  • 🎯 Auto-Optimization: Turbo mode automatically optimizes all settings
  • 🎵 High-Quality Voices: Premium neural voices for all languages
  • 📁 File Support: Process text files directly
  • 🔄 Real-time Playback: Automatic audio playback with system player

🎯 Quick Start

1. Installation

# Install from PyPI (recommended)
pip install TTS_ka

# Or install from source
git clone https://github.com/DavidTbilisi/TTS.git
cd TTS
pip install -e .

2. Basic Usage (Auto-Optimized by Default)

# Ultra-fast generation with auto-optimization (default behavior)
python -m TTS_ka "Hello, how are you today?" --lang en

# Georgian text with automatic optimization
python -m TTS_ka "გამარჯობა, როგორ ხართ?" --lang ka

# Russian text with smart chunking
python -m TTS_ka "Привет, как дела?" --lang ru

3. Clipboard Workflow (FASTEST)

# Copy any text, then run (fastest workflow):
python -m TTS_ka clipboard --lang en

# For different languages:
python -m TTS_ka clipboard --lang ka  # Georgian
python -m TTS_ka clipboard --lang ru  # Russian

4. File Processing

# Process text files directly (auto-optimized)
python -m TTS_ka document.txt --lang en

# Long files with custom settings
python -m TTS_ka large_file.txt --chunk-seconds 30 --parallel 6 --lang ru

📖 Complete Usage Guide

Command Syntax

python -m TTS_ka [TEXT_SOURCE] [OPTIONS]

Text Sources

  • Direct text: "Your text here"
  • Clipboard: clipboard (copy text first)
  • File path: file.txt, document.md, etc.

Essential Options

Option Description Examples
--lang Language: ka (Georgian), ru (Russian), en (English) --lang ka
--stream 🆕 Enable streaming playback (audio starts while generating) --stream
--chunk-seconds Chunk size in seconds (0=auto, 20-60 optimal) --chunk-seconds 30
--parallel Workers (0=auto, 2-8 recommended) --parallel 6
--no-play Skip automatic audio playback --no-play
--no-turbo Disable auto-optimization (legacy mode) --no-turbo
--help-full Show comprehensive help with examples --help-full

🏃‍♂️ Performance Examples

Speed Comparison (1000 words)

  • Traditional TTS: 25-40 seconds
  • TTS_ka Direct: 15-25 seconds
  • TTS_ka Turbo: 8-15 seconds
  • TTS_ka Chunked: 6-12 seconds ⚡
  • TTS_ka Streaming: 🔊 2-3 seconds to first audio (NEW!)

🆕 Streaming Playback - Audio Starts Immediately!

The new streaming feature starts playing audio within 2-3 seconds while the rest continues generating in the background. This provides an 85-90% reduction in perceived wait time!

Quick Usage:

# Basic streaming - audio starts almost instantly!
python -m TTS_ka "Your long text..." --lang en --stream

# From file with streaming
python -m TTS_ka article.txt --lang ka --stream

# Clipboard with streaming (fastest workflow)
python -m TTS_ka clipboard --stream

How It Works:

  1. Text is split into chunks (if needed)
  2. Chunks generate in parallel (2-8 workers)
  3. First chunk plays immediately (~2-3 seconds)
  4. Remaining chunks continue generating in background
  5. Final merged audio file is saved

Performance:

  • Without streaming: Wait 10-30+ seconds for all audio
  • With streaming: Hear audio in 2-3 seconds ⚡
  • Platform support: Windows, Linux, macOS

Advanced Streaming:

# Custom chunking for optimal streaming
python -m TTS_ka longtext.txt --stream --chunk-seconds 25 --parallel 6

# Streaming without final playback
python -m TTS_ka text.txt --stream --no-play

Real-World Examples

# 1. Quick phrases (instant generation)
python -m TTS_ka "Thank you very much!" --lang en
# ⚡ Completed in 2.3s (optimized)

# 2. Medium text (paragraph)
python -m TTS_ka "Lorem ipsum dolor sit amet..." --lang en  
# ⚡ Completed in 5.7s (direct)

# 3. Long document (chunked processing)
python -m TTS_ka large_document.txt --lang en
# Strategy: chunked generation, 6 workers
# ⚡ Completed in 12.4s (chunked)

# 4. Clipboard workflow (daily usage)
python -m TTS_ka clipboard --lang ka
# OPTIMIZED MODE - Georgian
# Processing: 45 words, 287 characters
# ⚡ Completed in 4.1s

🌍 Language Support

Language Code Voice Quality Speed Example
Georgian 🇬🇪 ka Premium Neural Fast --lang ka
Russian 🇷🇺 ru High Quality Very Fast --lang ru
English 🇬🇧 en Premium Neural Maximum --lang en

Voice Details

  • Georgian: ka-GE-EkaNeural - Premium female voice
  • Russian: ru-RU-SvetlanaNeural - High-quality female voice
  • English: en-GB-SoniaNeural - British English neural voice

⚙️ Advanced Usage

Custom Optimization

# Manual chunking for very long texts
python -m TTS_ka book_chapter.txt --chunk-seconds 45 --parallel 4 --lang en

# Maximum parallelization (for powerful systems)
python -m TTS_ka large_text.txt --parallel 8 --lang ru

# Batch processing (no audio playback)  
python -m TTS_ka document.txt --no-play --lang ka

# Legacy mode (disable auto-optimization)
python -m TTS_ka "text" --no-turbo --lang en

Workflow Integration

# Create alias for daily use
alias speak='python -m TTS_ka clipboard --lang en'

# Windows batch file (speak.bat)
@echo off
python -m TTS_ka clipboard --lang en

# Read web articles (with browser copy)
# 1. Copy article text
# 2. Run: python -m TTS_ka clipboard --lang en

🔧 Installation & Requirements

System Requirements

  • Python: 3.6+ (3.8+ recommended)
  • OS: Windows, macOS, Linux
  • Memory: 256MB+ available RAM
  • Network: Internet connection for voice synthesis

Dependencies

Required:

pip install edge-tts>=6.1.9        # Core TTS engine
pip install pydub>=0.25.1          # Audio processing  
pip install tqdm>=4.65.0           # Progress bars
pip install pyperclip>=1.8.2       # Clipboard support

System Requirements:

  • FFmpeg: Required for audio processing
    • Windows: Download from ffmpeg.org
    • macOS: brew install ffmpeg
    • Ubuntu: sudo apt install ffmpeg

Complete Installation

# Method 1: PyPI installation (simplest)
pip install TTS_ka

# Method 2: Development installation
git clone https://github.com/DavidTbilisi/TTS.git
cd TTS
pip install -e .

# Method 3: Manual dependencies
pip install edge-tts pydub tqdm pyperclip

# Verify installation
python -m TTS_ka "Installation successful!" --turbo --lang en

🎮 AutoHotkey Integration (Windows)

Quick Setup

  1. Install AutoHotkey v2
  2. Create tts_hotkeys.ahk:
; Ultra-fast TTS hotkeys
!e::  ; Alt+E - English
{
    Run("cmd /k python -m TTS_ka clipboard --lang en")
}

!r::  ; Alt+R - Russian  
{
    Run("cmd /k python -m TTS_ka clipboard --lang ru")
}

!x::  ; Alt+X - Georgian
{
    Run("cmd /k python -m TTS_ka clipboard --lang ka")
}
  1. Double-click to run, then:
    • Copy text → Alt+E for English
    • Copy text → Alt+R for Russian
    • Copy text → Alt+X for Georgian

Daily Workflow

  1. Browse web → Copy interesting text
  2. Press Alt+E → Instant speech
  3. Continue browsing while listening

🔍 Troubleshooting

Common Issues

1. "No module named 'edge_tts'"

pip install edge-tts>=6.1.9

2. "FFmpeg not found"

# Windows: Download and add to PATH
# macOS: brew install ffmpeg  
# Linux: sudo apt install ffmpeg

3. Slow generation

# Auto-optimization is enabled by default
python -m TTS_ka "text" --lang en

# Reduce parallel workers if network issues
python -m TTS_ka "text" --parallel 2 --lang en

# Use legacy mode only if needed
python -m TTS_ka "text" --no-turbo --lang en

4. Empty clipboard

# Ensure text is copied first
# Then run: python -m TTS_ka clipboard --turbo --lang en

Performance Optimization

For Maximum Speed:

# Use these exact settings for best performance (auto-optimized by default)
python -m TTS_ka clipboard --chunk-seconds 30 --parallel 6 --lang en

For System with Limited Resources:

# Reduce workers and chunk size
python -m TTS_ka text --parallel 2 --chunk-seconds 60 --lang en

📊 Performance Benchmarks

Text Length vs Generation Time

Words Direct Mode Turbo Mode Chunked (6 workers)
10-50 2-4s 1-3s 2-4s
100-300 8-12s 5-8s 4-6s
500-1000 18-25s 12-15s 8-12s
1000+ 30-45s 18-25s 10-18s

Optimal Settings by Text Length

# Short text (< 100 words): Direct generation (auto-optimized)
python -m TTS_ka "short text" --lang en

# Medium text (100-500 words): Auto-optimized mode
python -m TTS_ka medium_text.txt --lang en  

# Long text (500+ words): Chunked processing (auto-detected)
python -m TTS_ka long_text.txt --chunk-seconds 30 --parallel 6 --lang en

🚀 Examples & Use Cases

Daily Workflows

1. Article Reading

# Copy web article → instant speech
python -m TTS_ka clipboard --lang en

2. Document Processing

# Process research papers, books, etc.
python -m TTS_ka research_paper.pdf.txt --lang en

3. Language Learning

# Practice pronunciation with different languages
python -m TTS_ka "სწავლობდი ქართულს" --lang ka
python -m TTS_ka "Learning Russian язык" --lang ru

4. Accessibility

# Screen reader alternative
python -m TTS_ka clipboard --no-play --lang en > audio_file.mp3

Batch Processing

# Process multiple files
for file in *.txt; do
    python -m TTS_ka "$file" --no-play --lang en
done

# Windows batch processing
for %f in (*.txt) do python -m TTS_ka "%f" --no-play --lang en

🛠️ Advanced Configuration

Environment Variables

# Set default language
export TTS_DEFAULT_LANG=ka

# Set default mode  
export TTS_DEFAULT_MODE=turbo

# Custom output directory
export TTS_OUTPUT_DIR=/path/to/audio/files

Configuration File

Create ~/.tts_config.json:

{
    "default_lang": "en",
    "turbo_mode": true,
    "chunk_seconds": 30,
    "parallel_workers": 6,
    "auto_play": true
}

🔌 API Integration

Python Script Integration

#!/usr/bin/env python3
import subprocess
import sys

def text_to_speech(text, lang="en", turbo=True):
    """Convert text to speech using TTS_ka"""
    cmd = [
        "python", "-m", "TTS_ka", 
        text, 
        "--lang", lang
    ]
    if turbo:
        cmd.append("--turbo")
    
    subprocess.run(cmd)

# Usage
text_to_speech("Hello world!", "en")
text_to_speech("გამარჯობა!", "ka")

Web Integration

# URL to speech (with curl + TTS_ka)
curl -s "https://example.com/article" | \
python -m TTS_ka /dev/stdin --turbo --lang en

📱 Mobile & Remote Usage

SSH/Remote Usage

# Generate audio on remote server
ssh user@server "python -m TTS_ka 'Remote generation' --turbo --no-play"

# Download and play locally
scp user@server:data.mp3 ./remote_audio.mp3

Docker Usage

FROM python:3.9
RUN pip install TTS_ka
RUN apt-get update && apt-get install -y ffmpeg
ENTRYPOINT ["python", "-m", "TTS_ka"]
# Docker usage
docker run tts_container "Hello Docker!" --turbo --lang en

🎯 Tips & Best Practices

Performance Tips

  1. Auto-optimization is enabled by default - no flags needed!
  2. Use clipboard workflow for fastest daily usage
  3. Chunk long texts with --chunk-seconds 30
  4. Optimize workers with --parallel 4-6 for most systems
  5. Pre-install FFmpeg for best audio processing

Quality Tips

  1. Georgian text: Use --lang ka for best quality
  2. Mixed languages: Process separately for optimal results
  3. Technical text: Use shorter chunks (--chunk-seconds 20)
  4. Clean input: Remove extra whitespace and formatting

Workflow Tips

  1. Create aliases for frequent commands
  2. Use hotkeys (AutoHotkey on Windows)
  3. Batch process large document collections
  4. Test settings with small text first

📄 File Format Support

Supported Input Formats

  • Text files: .txt, .md, .rst
  • Code files: .py, .js, .html (extracts text)
  • Clipboard: Any copied text
  • Direct input: Command-line strings

Output Format

  • Audio: MP3 (high quality, compressed)
  • Bitrate: 128kbps (optimal size/quality balance)
  • Sample Rate: 24kHz (neural voice quality)

🔄 Updates & Maintenance

Keeping Updated

# Update to latest version
pip install --upgrade TTS_ka

# Check current version  
python -m TTS_ka --version

# Update dependencies
pip install --upgrade edge-tts pydub tqdm pyperclip

Health Check

# Test installation
python -m TTS_ka "System check" --turbo --lang en

# Verify FFmpeg  
ffmpeg -version

# Check Python version
python --version  # Should be 3.6+

🤝 Contributing

We welcome contributions! See our GitHub repository for:

  • Bug reports and feature requests
  • Code contributions and pull requests
  • Documentation improvements
  • Language support additions

Development Setup

git clone https://github.com/DavidTbilisi/TTS.git
cd TTS
pip install -e ".[dev]"
pytest  # Run tests

📞 Support

Getting Help

  1. Documentation: Use --help-full for comprehensive help
  2. Issues: Report bugs on GitHub Issues
  3. Discussions: Join GitHub Discussions

Quick Diagnostics

# Check system compatibility  
python -m TTS_ka --help-full

# Test with minimal command
python -m TTS_ka "test" --turbo --lang en

# Verify FFmpeg installation
ffmpeg -version

📜 License & Credits

License: MIT License - see LICENSE file

Credits:

  • Edge-TTS: Microsoft's edge-tts library for voice synthesis
  • PyDub: Audio processing and manipulation
  • FFmpeg: Audio encoding and format conversion

Author: David Chincharashvili (davidchincharashvili@gmail.com)


Star this project on GitHub if you find it useful!
🐛 Report issues to help improve the tool
🤝 Contribute to make it even better

About

Text To Speech (Georgian, English, Russian)

Topics

Resources

License

Stars

Watchers

Forks

Languages