Skip to content

Loosely based on LucaVitali's work; with some extensions

License

Notifications You must be signed in to change notification settings

sjackson0109/TextToSpeech-Generator

Repository files navigation

TextToSpeech Generator v3.2

An enterprise-grade modular application for converting text to speech using multiple TTS providers. Features advanced architecture, security framework, performance monitoring, bulk CSV processing, and comprehensive testing infrastructure.

Version Platform PowerShell Architecture licence Status Security

Attribution & licence

This project is inspired by Luca Vitali's original works AzureTTSVoiceGeneratorGUI.

  • Original Work: Luca Vitali (2019, MIT licence)

  • Enhanced Version: Simon Jackson (2024-2025, MIT licence)

    See ATTRIBUTION.md and licence.md for details.

Both original and derivative works are licenced under the MIT licence, allowing free use, modification, and distribution with proper attribution.

Features

  • Enterprise Modular Architecture - 6 dedicated modules with proper separation of concerns
  • Advanced Configuration - JSON-based multi-environment profiles (Development/Production/Testing)
  • Security Framework - Certificate-based encryption and secure credential storage
  • Performance Monitoring - Real-time system metrics and intelligent caching
  • Testing Infrastructure - Comprehensive Pester test suites with automated validation
  • Error Recovery - Intelligent provider-specific recovery strategies with exponential backoff
  • Encrypted Storage - Certificate-based encryption for sensitive API keys
  • Audit Trails - Comprehensive logging of all configuration changes and security events
  • Input Validation - Enterprise-grade sanitization and validation frameworks
  • Error Classification - Provider-specific error codes with detailed resolution guidance
  • Configuration Migration - Seamless upgrade path from legacy XML to modern JSON
  • Microsoft Azure Cognitive Services - Premium neural voices with SSML support and regional deployment
  • AWS Polly - Neural engine with lifelike speech synthesis and custom lexicons
  • ElevenLabs - Ultra-realistic AI voices with emotion control and voice cloning
  • Google Cloud TTS - WaveNet technology with advanced prosody control
  • Murf AI - Gen2 and Falcon models with 150+ ultra-realistic voices across 20+ languages
  • OpenAI TTS - GPT-4 powered voices (alloy, echo, fable, onyx, nova, shimmer) with HD quality
  • Telnyx - WebSocket streaming with 266+ voices (KokoroTTS, Natural, NaturalHD models)
  • Twilio - Telephony-optimised TTS for communication workflows
  • Intelligent Threading - Auto-optimised parallel processing (3-6x speed improvement)
  • Memory Management - Automatic garbage collection and memory threshold monitoring
  • Caching System - LRU cache with TTL for improved response times
  • Progress Tracking - Real-time updates with thread-safe UI integration
  • Bulk Processing - CSV batch processing with intelligent load balancing
  • Professional Interface - Contemporary dark theme with intuitive controls
  • Multi-Environment Support - Switch between Development/Production/Testing profiles
  • Real-time Validation - Instant feedback on configuration and API connectivity
  • Comprehensive Testing - Built-in test modes and validation tools
  • Legacy Migration - Automatic upgrade from older configuration formats

Requirements

Operating System: Windows 10/11 or Windows Server 2016+ PowerShell: Version 5.1 or higher .NET Framework: 4.7.2 or higher (required for GUI/XAML support; PowerShell 5.1 uses .NET Framework by default)

  • For full GUI functionality and XAML window rendering, you must run on Windows with .NET Framework 4.7.2 or newer. WPF/XAML features are not available on PowerShell Core or non-Windows environments. API Access: Valid API keys for chosen TTS provider(s) Subscription/Billing: The ability to pay for your consumption of the TTS providers API. Ensure you read the chosen providers documentation.

Note: Simon Jackson and Luca Vitali will not be held responsible for you not understanding that there are sometimes costs involved with using external APIs.

Quick Preview

  • Main Console: <Main Console>
  • API Dropdown (list of TTS providers supported): <Provider List>
  • API Config (unique per provider): <API Config\n Provider Specific>

Installation

New Installation (v3.2+)

  1. Clone or Download:

    git clone https://github.com/sjackson0109/TextToSpeech-Generator.git
  2. Navigate to Directory:

    cd TextToSpeech-Generator
  3. Run Application:

    .\StartTTS.ps1

    Advanced Options:

    .\StartTTS.ps1 -TestMode                    # System validation only
    .\StartTTS.ps1 -RunTests -GenerateReport   # Run tests with reporting
    .\StartTTS.ps1 -ConfigProfile "Production" # Use production settings

Upgrading from v3.1 or Earlier

If you have an existing installation with TextToSpeech-Generator.xml configuration:

  1. Backup Your Configuration (automatic during migration):

    # Your existing XML file will be automatically backed up
  2. Run Migration Utility:

    .\MigrateLegacyConfig.ps1
  3. Verify Migration:

    .\StartTTS.ps1 -TestMode
  4. Update Your Workflow:

    • Use .\StartTTS.ps1 instead of .\TextToSpeech-Generator.ps1
    • Configure providers using the new JSON-based system
    • Take advantage of multi-environment profiles (Development/Production/Testing)

API Configuration

All providers support two secure methods for supplying credentials and configuration:

  1. Configuration File (Default.json/config.json):
    • Enter credentials in the GUI or JSON config file (not recommended for sensitive production keys).
  2. Environment Variables (Recommended for Testing/Security):
    • Set environment variables in your PowerShell session before launching the app. No credentials are ever written to disk.

Supported Environment Variables

Provider Required Environment Variables
Azure AZURE_SPEECH_KEY, AZURE_SPEECH_REGION
AWS Polly AWS_POLLY_ACCESS_KEY, AWS_POLLY_SECRET_KEY, AWS_POLLY_REGION
Google Cloud GOOGLE_CLOUD_API_KEY
Twilio TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN
VoiceForge VOICEFORGE_API_KEY
VoiceWare VOICEWARE_API_KEY, VOICEWARE_REGION, VOICEWARE_VOICE

Example (PowerShell):

$env:AZURE_SPEECH_KEY = 'your-azure-key'
$env:AZURE_SPEECH_REGION = 'your-azure-region'
$env:AWS_POLLY_ACCESS_KEY = 'your-aws-access-key'
$env:AWS_POLLY_SECRET_KEY = 'your-aws-secret-key'
$env:AWS_POLLY_REGION = 'your-aws-region'
$env:GOOGLE_CLOUD_API_KEY = 'your-google-api-key'
$env:TWILIO_ACCOUNT_SID = 'your-twilio-sid'
$env:TWILIO_AUTH_TOKEN = 'your-twilio-token'
$env:VOICEFORGE_API_KEY = 'your-voiceforge-key'
$env:VOICEWARE_API_KEY = 'your-voiceware-key'
$env:VOICEWARE_REGION = 'your-voiceware-region'
$env:VOICEWARE_VOICE = 'your-voiceware-voice'

Compatibility Note: All code in this project is designed to work with both PowerShell v5 and v7. If you encounter any compatibility issues, please report them via GitHub Issues.

Set these variables in your session before running .\StartTTS.ps1. The app will automatically use them if config fields are missing or empty.

Important:

  • For Azure, your API key and region must match, and you must select a valid Azure voice or TTS generation will fail.
  • For AWS Polly, your access/secret keys and region must match, and you must select a valid Polly voice or TTS generation will fail.
  • For Google Cloud, your API key, region, and voice must be valid and match your project setup.
  • For Twilio, you must provide a valid account SID, auth token, and select a supported Twilio voice.
  • For VoiceForge, you must provide a valid API key and select a supported VoiceForge voice.
  • For VoiceWare, your API key must match the selected region, and you must select a valid VoiceWare voice or TTS generation will fail.
Provider Details
Azure Cognitive Services Status: Full production implementation with real API calls
Quality: Premium neural voices with natural prosody and SSML support
Free Tier: 5,000 transactions/month
Languages: 140+ languages, 400+ voices
Complete Setup Guide →
Google Cloud Text-to-Speech Status: Full production implementation with real API calls
Quality: WaveNet technology for human-like speech with advanced options
Free Tier: 1M WaveNet characters/month
Languages: 40+ languages, 220+ voices
Complete Setup Guide →
AWS Polly Status: Full production implementation with real API calls
Quality: Neural and standard voices with AWS Signature V4 authentication
Free Tier: 1M characters/month for speech synthesis
Languages: 60+ languages, 570+ voices including neural options
Complete Setup Guide →
Twilio Status: Full production implementation with real API calls
Quality: TTS integration within telephony and IVR workflows
Features: TwiML generation, call API integration, multi-language support
Languages: 11+ languages with voice selection across providers
Complete Setup Guide →
VoiceForge Status: Full production implementation with real API calls
Quality: Character-style and novelty voices for creative applications
Features: High-quality synthesis, SSML processing, multiple audio formats
Languages: Multi-language support with specialised voice characters
Complete Setup Guide →
VoiceWare Status: Experimental integration
Quality: Neural and expressive voices
Features: SSML support, multiple audio formats, regional selection
Languages: Multi-language support
Complete Setup Guide →

Important:

  • Your VoiceWare API key must match the selected region, or authentication will fail.
  • You must select a valid VoiceWare voice in your configuration/profile, or TTS generation will fail. | VoiceWare | Status: Experimental integration
    Quality: Neural and expressive voices
    Features: SSML support, multiple audio formats, regional selection
    Languages: Multi-language support
    Complete Setup Guide → |

Usage Guide

Single Script Processing

  1. Select Mode: Choose "Single-Script" radio button
  2. Enter Text: Type your text in the input box
  3. Configure Settings: Choose provider, voice, and output folder
  4. Generate: Click "Go!" or press F5

Bulk Processing from CSV

  1. Prepare CSV File: Create properly formatted CSV file

    SCRIPT,FILENAME
    "Hello world, this is a test.",test_audio_1
    "Welcome to our service.",welcome_message

    Complete CSV Format Guide → - Detailed specifications and examples

  2. Select Mode: Choose "Bulk-Scripts" radio button

  3. Load File: Click browse button or press Ctrl+O

  4. Configure Settings: Set provider, voice, and output folder

  5. Process: Click "Go!" or press F5

Keyboard Shortcuts

  • F5 or Ctrl+R: Start generation process
  • Ctrl+S: Save configuration
  • Ctrl+O: Open input file browser
  • Escape: Clear log window

Security Considerations

Secure Credential Storage

The application offers secure API key storage using Windows Credential Manager:

  • Keys are encrypted and stored securely by Windows
  • Plain text storage is avoided when possible
  • Automatic detection of stored credentials

Input Validation

  • CSV structure validation before processing
  • File path sanitization to prevent traversal attacks
  • HTML encoding for script content
  • API key format validation

File Structure

TextToSpeech-Generator/
├─ StartTTS.ps1                      # Main application launcher (v3.2+)
├─ config.json                              # Modern JSON configuration
├─ MigrateLegacyConfig.ps1                  # XML to JSON migration utility
├─ TextToSpeech-Generator.ps1               # Legacy GUI component (transitional)
├─ Modules/                                 # Modular architecture
│  ├─ Logging/Logging.psm1          # Enterprise logging system
│  ├─ Security/EnhancedSecurity.psm1        # Certificate-based encryption
│  ├─ Configuration/AdvancedConfiguration.psm1 # Multi-environment profiles
│  ├─ TTSProviders/TTSProviders.psm1        # Modular TTS provider implementations
│  ├─ Utilities/UtilityFunctions.psm1       # Supporting utility functions
│  ├─ ErrorRecovery/ErrorRecovery.psm1      # Intelligent error recovery strategies
│  └─ PerformanceMonitoring/PerformanceMonitoring.psm1 # Performance metrics & caching
├─ Tests/                                   # Comprehensive test suites
│  ├─ Unit/                                 # Unit tests for individual modules
│  ├─ Integration/                          # Integration tests for system components
│  └─ Performance/                          # Performance benchmarking tests
├─ README.md                                # Project overview (this file)
├─ licence                                  # MIT licence
├─ GUI-Timeline/                            # Development timeline screenshots
│  └─ 20210922 - Single-Mode File-Save issue.PNG

Default Configuration

Configuration is stored in config.json with provider-specific settings organised by environment profiles (Development/Production/Testing). Each provider section contains authentication, voice selection, and audio format preferences.

Audio Formats

Provider Supported Formats Default Format
Azure Cognitive Services WAV, MP3, OGG, WEBM, FLAC riff-16khz-16bit-mono-pcm
Google Cloud TTS LINEAR16, MP3, OGG_OPUS, MULAW, ALAW LINEAR16
AWS Polly PCM, MP3, OGG_VORBIS, JSON mp3
Twilio WAV, MP3 mp3
VoiceForge WAV, MP3, OGG mp3
VoiceWare WAV, MP3, OGG mp3

Voice Options

Provider Voice Count Languages Voice Types Sample Voices
Azure Cognitive Services 400+ voices 140+ languages Neural, Standard en-US-JennyNeural, en-GB-RyanNeural, fr-FR-DeniseNeural
AWS Polly 570+ voices 60+ languages Neural, Standard Joanna, Matthew, Emma, Brian, Celine
Google Cloud TTS 220+ voices 40+ languages WaveNet, Neural2, Standard en-US-Wavenet-D, en-GB-Neural2-A, fr-FR-Wavenet-E
Twilio 50+ voices 11+ languages Telephony-optimised alice, man, woman (provider-specific)
VoiceForge 200+ voices 15+ languages Character, Novelty Robot, Alien, Wizard, Princess, Monster
VoiceWare 100+ voices 20+ languages Neural, Expressive en-US-Standard, en-GB-Premium, fr-FR-Neural

Troubleshooting

Quick Fixes

Authentication Errors:

  • Verify API key is correct and active
  • Check datacenter region matches your subscription
  • See provider-specific setup: AWS Polly | ElevenLabs | Google Cloud | [Microsoft Azure](docs/providers/Microsoft Azure.md) | Murf AI | Twilio

File Processing Errors:

  • Validate CSV format - see CSV Format Guide
  • Check output folder write permissions
  • Verify file paths don't contain invalid characters

Network Issues:

  • Check internet connectivity
  • Verify firewall isn't blocking HTTPS requests
  • Try different datacenter region if available

Complete Troubleshooting Guide → - Comprehensive problem-solving resource

Logging

Application logs are saved to application.log in the application directory:

2025-10-10 14:30:15 [INFO] Starting TextToSpeech Generator v3.2
2025-10-10 14:30:20 [INFO] Loaded 400 voices from Azure Cognitive Services
2025-10-10 14:30:45 [INFO] Generated: welcome_message
2025-10-10 14:30:50 [ERROR] Authentication failed: Invalid API key

Log levels: INFO, WARNING, ERROR, DEBUG

Contributing

Contributions are welcome! Please read our contributing guidelines:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes with proper documentation
  4. Test thoroughly
  5. Submit a pull request

Authors

  • Luca Vitali - Original concept and implementation
  • Simon Jackson - Enhanced security, error handling, and additional features

Acknowledgments

  • Azure Cognitive Services team
  • Google Cloud Text-to-Speech team
  • PowerShell community for WPF guidance

Support

  • Issues: Please log Issues using -> GitHub Issues
  • Documentation: README.md
  • Email: NOT AVAILABLE FOR THIS PROJECT

About

Loosely based on LucaVitali's work; with some extensions

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Sponsor this project

 

Packages

No packages published