An enterprise-grade modular application for converting text to speech using multiple TTS providers. Features advanced architecture, security framework, performance monitoring, bulk CSV processing, and comprehensive testing infrastructure.
This project is inspired by Luca Vitali's original works AzureTTSVoiceGeneratorGUI.
-
Original Work: Luca Vitali (2019, MIT licence)
-
Enhanced Version: Simon Jackson (2024-2025, MIT licence)
See ATTRIBUTION.md and licence.md for details.
Both original and derivative works are licenced under the MIT licence, allowing free use, modification, and distribution with proper attribution.
- Enterprise Modular Architecture - 6 dedicated modules with proper separation of concerns
- Advanced Configuration - JSON-based multi-environment profiles (Development/Production/Testing)
- Security Framework - Certificate-based encryption and secure credential storage
- Performance Monitoring - Real-time system metrics and intelligent caching
- Testing Infrastructure - Comprehensive Pester test suites with automated validation
- Error Recovery - Intelligent provider-specific recovery strategies with exponential backoff
- Encrypted Storage - Certificate-based encryption for sensitive API keys
- Audit Trails - Comprehensive logging of all configuration changes and security events
- Input Validation - Enterprise-grade sanitization and validation frameworks
- Error Classification - Provider-specific error codes with detailed resolution guidance
- Configuration Migration - Seamless upgrade path from legacy XML to modern JSON
- Microsoft Azure Cognitive Services - Premium neural voices with SSML support and regional deployment
- AWS Polly - Neural engine with lifelike speech synthesis and custom lexicons
- ElevenLabs - Ultra-realistic AI voices with emotion control and voice cloning
- Google Cloud TTS - WaveNet technology with advanced prosody control
- Murf AI - Gen2 and Falcon models with 150+ ultra-realistic voices across 20+ languages
- OpenAI TTS - GPT-4 powered voices (alloy, echo, fable, onyx, nova, shimmer) with HD quality
- Telnyx - WebSocket streaming with 266+ voices (KokoroTTS, Natural, NaturalHD models)
- Twilio - Telephony-optimised TTS for communication workflows
- Intelligent Threading - Auto-optimised parallel processing (3-6x speed improvement)
- Memory Management - Automatic garbage collection and memory threshold monitoring
- Caching System - LRU cache with TTL for improved response times
- Progress Tracking - Real-time updates with thread-safe UI integration
- Bulk Processing - CSV batch processing with intelligent load balancing
- Professional Interface - Contemporary dark theme with intuitive controls
- Multi-Environment Support - Switch between Development/Production/Testing profiles
- Real-time Validation - Instant feedback on configuration and API connectivity
- Comprehensive Testing - Built-in test modes and validation tools
- Legacy Migration - Automatic upgrade from older configuration formats
Operating System: Windows 10/11 or Windows Server 2016+ PowerShell: Version 5.1 or higher .NET Framework: 4.7.2 or higher (required for GUI/XAML support; PowerShell 5.1 uses .NET Framework by default)
- For full GUI functionality and XAML window rendering, you must run on Windows with .NET Framework 4.7.2 or newer. WPF/XAML features are not available on PowerShell Core or non-Windows environments. API Access: Valid API keys for chosen TTS provider(s) Subscription/Billing: The ability to pay for your consumption of the TTS providers API. Ensure you read the chosen providers documentation.
Note: Simon Jackson and Luca Vitali will not be held responsible for you not understanding that there are sometimes costs involved with using external APIs.
-
Clone or Download:
git clone https://github.com/sjackson0109/TextToSpeech-Generator.git
-
Navigate to Directory:
cd TextToSpeech-Generator -
Run Application:
.\StartTTS.ps1
Advanced Options:
.\StartTTS.ps1 -TestMode # System validation only .\StartTTS.ps1 -RunTests -GenerateReport # Run tests with reporting .\StartTTS.ps1 -ConfigProfile "Production" # Use production settings
If you have an existing installation with TextToSpeech-Generator.xml configuration:
-
Backup Your Configuration (automatic during migration):
# Your existing XML file will be automatically backed up -
Run Migration Utility:
.\MigrateLegacyConfig.ps1
-
Verify Migration:
.\StartTTS.ps1 -TestMode -
Update Your Workflow:
- Use
.\StartTTS.ps1instead of.\TextToSpeech-Generator.ps1 - Configure providers using the new JSON-based system
- Take advantage of multi-environment profiles (Development/Production/Testing)
- Use
All providers support two secure methods for supplying credentials and configuration:
- Configuration File (Default.json/config.json):
- Enter credentials in the GUI or JSON config file (not recommended for sensitive production keys).
- Environment Variables (Recommended for Testing/Security):
- Set environment variables in your PowerShell session before launching the app. No credentials are ever written to disk.
| Provider | Required Environment Variables |
|---|---|
| Azure | AZURE_SPEECH_KEY, AZURE_SPEECH_REGION |
| AWS Polly | AWS_POLLY_ACCESS_KEY, AWS_POLLY_SECRET_KEY, AWS_POLLY_REGION |
| Google Cloud | GOOGLE_CLOUD_API_KEY |
| Twilio | TWILIO_ACCOUNT_SID, TWILIO_AUTH_TOKEN |
| VoiceForge | VOICEFORGE_API_KEY |
| VoiceWare | VOICEWARE_API_KEY, VOICEWARE_REGION, VOICEWARE_VOICE |
Example (PowerShell):
$env:AZURE_SPEECH_KEY = 'your-azure-key'
$env:AZURE_SPEECH_REGION = 'your-azure-region'
$env:AWS_POLLY_ACCESS_KEY = 'your-aws-access-key'
$env:AWS_POLLY_SECRET_KEY = 'your-aws-secret-key'
$env:AWS_POLLY_REGION = 'your-aws-region'
$env:GOOGLE_CLOUD_API_KEY = 'your-google-api-key'
$env:TWILIO_ACCOUNT_SID = 'your-twilio-sid'
$env:TWILIO_AUTH_TOKEN = 'your-twilio-token'
$env:VOICEFORGE_API_KEY = 'your-voiceforge-key'
$env:VOICEWARE_API_KEY = 'your-voiceware-key'
$env:VOICEWARE_REGION = 'your-voiceware-region'
$env:VOICEWARE_VOICE = 'your-voiceware-voice'Compatibility Note: All code in this project is designed to work with both PowerShell v5 and v7. If you encounter any compatibility issues, please report them via GitHub Issues.
Set these variables in your session before running .\StartTTS.ps1. The app will automatically use them if config fields are missing or empty.
Important:
- For Azure, your API key and region must match, and you must select a valid Azure voice or TTS generation will fail.
- For AWS Polly, your access/secret keys and region must match, and you must select a valid Polly voice or TTS generation will fail.
- For Google Cloud, your API key, region, and voice must be valid and match your project setup.
- For Twilio, you must provide a valid account SID, auth token, and select a supported Twilio voice.
- For VoiceForge, you must provide a valid API key and select a supported VoiceForge voice.
- For VoiceWare, your API key must match the selected region, and you must select a valid VoiceWare voice or TTS generation will fail.
| Provider | Details |
|---|---|
| Azure Cognitive Services | Status: Full production implementation with real API calls Quality: Premium neural voices with natural prosody and SSML support Free Tier: 5,000 transactions/month Languages: 140+ languages, 400+ voices Complete Setup Guide → |
| Google Cloud Text-to-Speech | Status: Full production implementation with real API calls Quality: WaveNet technology for human-like speech with advanced options Free Tier: 1M WaveNet characters/month Languages: 40+ languages, 220+ voices Complete Setup Guide → |
| AWS Polly | Status: Full production implementation with real API calls Quality: Neural and standard voices with AWS Signature V4 authentication Free Tier: 1M characters/month for speech synthesis Languages: 60+ languages, 570+ voices including neural options Complete Setup Guide → |
| Twilio | Status: Full production implementation with real API calls Quality: TTS integration within telephony and IVR workflows Features: TwiML generation, call API integration, multi-language support Languages: 11+ languages with voice selection across providers Complete Setup Guide → |
| VoiceForge | Status: Full production implementation with real API calls Quality: Character-style and novelty voices for creative applications Features: High-quality synthesis, SSML processing, multiple audio formats Languages: Multi-language support with specialised voice characters Complete Setup Guide → |
| VoiceWare | Status: Experimental integration Quality: Neural and expressive voices Features: SSML support, multiple audio formats, regional selection Languages: Multi-language support Complete Setup Guide → |
Important:
- Your VoiceWare API key must match the selected region, or authentication will fail.
- You must select a valid VoiceWare voice in your configuration/profile, or TTS generation will fail. | VoiceWare | Status: Experimental integration
Quality: Neural and expressive voices
Features: SSML support, multiple audio formats, regional selection
Languages: Multi-language support
Complete Setup Guide → |
- Select Mode: Choose "Single-Script" radio button
- Enter Text: Type your text in the input box
- Configure Settings: Choose provider, voice, and output folder
- Generate: Click "Go!" or press F5
-
Prepare CSV File: Create properly formatted CSV file
SCRIPT,FILENAME "Hello world, this is a test.",test_audio_1 "Welcome to our service.",welcome_message
Complete CSV Format Guide → - Detailed specifications and examples
-
Select Mode: Choose "Bulk-Scripts" radio button
-
Load File: Click browse button or press Ctrl+O
-
Configure Settings: Set provider, voice, and output folder
-
Process: Click "Go!" or press F5
- F5 or Ctrl+R: Start generation process
- Ctrl+S: Save configuration
- Ctrl+O: Open input file browser
- Escape: Clear log window
The application offers secure API key storage using Windows Credential Manager:
- Keys are encrypted and stored securely by Windows
- Plain text storage is avoided when possible
- Automatic detection of stored credentials
- CSV structure validation before processing
- File path sanitization to prevent traversal attacks
- HTML encoding for script content
- API key format validation
TextToSpeech-Generator/
├─ StartTTS.ps1 # Main application launcher (v3.2+)
├─ config.json # Modern JSON configuration
├─ MigrateLegacyConfig.ps1 # XML to JSON migration utility
├─ TextToSpeech-Generator.ps1 # Legacy GUI component (transitional)
├─ Modules/ # Modular architecture
│ ├─ Logging/Logging.psm1 # Enterprise logging system
│ ├─ Security/EnhancedSecurity.psm1 # Certificate-based encryption
│ ├─ Configuration/AdvancedConfiguration.psm1 # Multi-environment profiles
│ ├─ TTSProviders/TTSProviders.psm1 # Modular TTS provider implementations
│ ├─ Utilities/UtilityFunctions.psm1 # Supporting utility functions
│ ├─ ErrorRecovery/ErrorRecovery.psm1 # Intelligent error recovery strategies
│ └─ PerformanceMonitoring/PerformanceMonitoring.psm1 # Performance metrics & caching
├─ Tests/ # Comprehensive test suites
│ ├─ Unit/ # Unit tests for individual modules
│ ├─ Integration/ # Integration tests for system components
│ └─ Performance/ # Performance benchmarking tests
├─ README.md # Project overview (this file)
├─ licence # MIT licence
├─ GUI-Timeline/ # Development timeline screenshots
│ └─ 20210922 - Single-Mode File-Save issue.PNG
Configuration is stored in config.json with provider-specific settings organised by environment profiles (Development/Production/Testing). Each provider section contains authentication, voice selection, and audio format preferences.
| Provider | Supported Formats | Default Format |
|---|---|---|
| Azure Cognitive Services | WAV, MP3, OGG, WEBM, FLAC | riff-16khz-16bit-mono-pcm |
| Google Cloud TTS | LINEAR16, MP3, OGG_OPUS, MULAW, ALAW | LINEAR16 |
| AWS Polly | PCM, MP3, OGG_VORBIS, JSON | mp3 |
| Twilio | WAV, MP3 | mp3 |
| VoiceForge | WAV, MP3, OGG | mp3 |
| VoiceWare | WAV, MP3, OGG | mp3 |
| Provider | Voice Count | Languages | Voice Types | Sample Voices |
|---|---|---|---|---|
| Azure Cognitive Services | 400+ voices | 140+ languages | Neural, Standard | en-US-JennyNeural, en-GB-RyanNeural, fr-FR-DeniseNeural |
| AWS Polly | 570+ voices | 60+ languages | Neural, Standard | Joanna, Matthew, Emma, Brian, Celine |
| Google Cloud TTS | 220+ voices | 40+ languages | WaveNet, Neural2, Standard | en-US-Wavenet-D, en-GB-Neural2-A, fr-FR-Wavenet-E |
| Twilio | 50+ voices | 11+ languages | Telephony-optimised | alice, man, woman (provider-specific) |
| VoiceForge | 200+ voices | 15+ languages | Character, Novelty | Robot, Alien, Wizard, Princess, Monster |
| VoiceWare | 100+ voices | 20+ languages | Neural, Expressive | en-US-Standard, en-GB-Premium, fr-FR-Neural |
Authentication Errors:
- Verify API key is correct and active
- Check datacenter region matches your subscription
- See provider-specific setup: AWS Polly | ElevenLabs | Google Cloud | [Microsoft Azure](docs/providers/Microsoft Azure.md) | Murf AI | Twilio
File Processing Errors:
- Validate CSV format - see CSV Format Guide
- Check output folder write permissions
- Verify file paths don't contain invalid characters
Network Issues:
- Check internet connectivity
- Verify firewall isn't blocking HTTPS requests
- Try different datacenter region if available
Complete Troubleshooting Guide → - Comprehensive problem-solving resource
Application logs are saved to application.log in the application directory:
2025-10-10 14:30:15 [INFO] Starting TextToSpeech Generator v3.2
2025-10-10 14:30:20 [INFO] Loaded 400 voices from Azure Cognitive Services
2025-10-10 14:30:45 [INFO] Generated: welcome_message
2025-10-10 14:30:50 [ERROR] Authentication failed: Invalid API key
Log levels: INFO, WARNING, ERROR, DEBUG
Contributions are welcome! Please read our contributing guidelines:
- Fork the repository
- Create a feature branch
- Make your changes with proper documentation
- Test thoroughly
- Submit a pull request
- Luca Vitali - Original concept and implementation
- Simon Jackson - Enhanced security, error handling, and additional features
- Azure Cognitive Services team
- Google Cloud Text-to-Speech team
- PowerShell community for WPF guidance
- Issues: Please log Issues using -> GitHub Issues
- Documentation: README.md
- Email: NOT AVAILABLE FOR THIS PROJECT

