TextToSpeech Generator v3.2

An enterprise-grade modular application for converting text to speech using multiple TTS providers. Features advanced architecture, security framework, performance monitoring, bulk CSV processing, and comprehensive testing infrastructure.

Attribution & licence

This project is inspired by Luca Vitali's original works AzureTTSVoiceGeneratorGUI.

Original Work: Luca Vitali (2019, MIT licence)
Enhanced Version: Simon Jackson (2024-2025, MIT licence)

See ATTRIBUTION.md and licence.md for details.

Both original and derivative works are licenced under the MIT licence, allowing free use, modification, and distribution with proper attribution.

Features

Enterprise Modular Architecture - 6 dedicated modules with proper separation of concerns
Advanced Configuration - JSON-based multi-environment profiles (Development/Production/Testing)
Security Framework - Certificate-based encryption and secure credential storage
Performance Monitoring - Real-time system metrics and intelligent caching
Testing Infrastructure - Comprehensive Pester test suites with automated validation
Error Recovery - Intelligent provider-specific recovery strategies with exponential backoff
Encrypted Storage - Certificate-based encryption for sensitive API keys
Audit Trails - Comprehensive logging of all configuration changes and security events
Input Validation - Enterprise-grade sanitization and validation frameworks
Error Classification - Provider-specific error codes with detailed resolution guidance
Configuration Migration - Seamless upgrade path from legacy XML to modern JSON
Microsoft Azure Cognitive Services - Premium neural voices with SSML support and regional deployment
AWS Polly - Neural engine with lifelike speech synthesis and custom lexicons
ElevenLabs - Ultra-realistic AI voices with emotion control and voice cloning
Google Cloud TTS - WaveNet technology with advanced prosody control
Murf AI - Gen2 and Falcon models with 150+ ultra-realistic voices across 20+ languages
OpenAI TTS - GPT-4 powered voices (alloy, echo, fable, onyx, nova, shimmer) with HD quality
Telnyx - WebSocket streaming with 266+ voices (KokoroTTS, Natural, NaturalHD models)
Twilio - Telephony-optimised TTS for communication workflows
Intelligent Threading - Auto-optimised parallel processing (3-6x speed improvement)
Memory Management - Automatic garbage collection and memory threshold monitoring
Caching System - LRU cache with TTL for improved response times
Progress Tracking - Real-time updates with thread-safe UI integration
Bulk Processing - CSV batch processing with intelligent load balancing
Professional Interface - Contemporary dark theme with intuitive controls
Multi-Environment Support - Switch between Development/Production/Testing profiles
Real-time Validation - Instant feedback on configuration and API connectivity
Comprehensive Testing - Built-in test modes and validation tools
Legacy Migration - Automatic upgrade from older configuration formats

Requirements

Operating System: Windows 10/11 or Windows Server 2016+ PowerShell: Version 5.1 or higher .NET Framework: 4.7.2 or higher (required for GUI/XAML support; PowerShell 5.1 uses .NET Framework by default)

For full GUI functionality and XAML window rendering, you must run on Windows with .NET Framework 4.7.2 or newer. WPF/XAML features are not available on PowerShell Core or non-Windows environments. API Access: Valid API keys for chosen TTS provider(s) Subscription/Billing: The ability to pay for your consumption of the TTS providers API. Ensure you read the chosen providers documentation.

Note: Simon Jackson and Luca Vitali will not be held responsible for you not understanding that there are sometimes costs involved with using external APIs.

Quick Preview

Main Console:
API Dropdown (list of TTS providers supported):
API Config (unique per provider): $<API Config\n Provider Specific>$

Installation

New Installation (v3.2+)

Clone or Download:

git clone https://github.com/sjackson0109/TextToSpeech-Generator.git

Navigate to Directory:
```
cd TextToSpeech-Generator
```

Run Application:

.\StartTTS.ps1

Advanced Options:

.\StartTTS.ps1 -TestMode                    # System validation only
.\StartTTS.ps1 -RunTests -GenerateReport   # Run tests with reporting
.\StartTTS.ps1 -ConfigProfile "Production" # Use production settings

Upgrading from v3.1 or Earlier

If you have an existing installation with TextToSpeech-Generator.xml configuration:

Backup Your Configuration (automatic during migration):

# Your existing XML file will be automatically backed up

Run Migration Utility:
```
.\MigrateLegacyConfig.ps1
```
Verify Migration:
```
.\StartTTS.ps1 -TestMode
```
Update Your Workflow:
- Use .\StartTTS.ps1 instead of .\TextToSpeech-Generator.ps1
- Configure providers using the new JSON-based system
- Take advantage of multi-environment profiles (Development/Production/Testing)

API Configuration

All providers support two secure methods for supplying credentials and configuration:

Configuration File (Default.json/config.json):
- Enter credentials in the GUI or JSON config file (not recommended for sensitive production keys).
Environment Variables (Recommended for Testing/Security):
- Set environment variables in your PowerShell session before launching the app. No credentials are ever written to disk.

Supported Environment Variables

Provider	Required Environment Variables
Azure	`AZURE_SPEECH_KEY`, `AZURE_SPEECH_REGION`
AWS Polly	`AWS_POLLY_ACCESS_KEY`, `AWS_POLLY_SECRET_KEY`, `AWS_POLLY_REGION`
Google Cloud	`GOOGLE_CLOUD_API_KEY`
Twilio	`TWILIO_ACCOUNT_SID`, `TWILIO_AUTH_TOKEN`
VoiceForge	`VOICEFORGE_API_KEY`
VoiceWare	`VOICEWARE_API_KEY`, `VOICEWARE_REGION`, `VOICEWARE_VOICE`

Example (PowerShell):

$env:AZURE_SPEECH_KEY = 'your-azure-key'
$env:AZURE_SPEECH_REGION = 'your-azure-region'
$env:AWS_POLLY_ACCESS_KEY = 'your-aws-access-key'
$env:AWS_POLLY_SECRET_KEY = 'your-aws-secret-key'
$env:AWS_POLLY_REGION = 'your-aws-region'
$env:GOOGLE_CLOUD_API_KEY = 'your-google-api-key'
$env:TWILIO_ACCOUNT_SID = 'your-twilio-sid'
$env:TWILIO_AUTH_TOKEN = 'your-twilio-token'
$env:VOICEFORGE_API_KEY = 'your-voiceforge-key'
$env:VOICEWARE_API_KEY = 'your-voiceware-key'
$env:VOICEWARE_REGION = 'your-voiceware-region'
$env:VOICEWARE_VOICE = 'your-voiceware-voice'

Compatibility Note: All code in this project is designed to work with both PowerShell v5 and v7. If you encounter any compatibility issues, please report them via GitHub Issues.

Set these variables in your session before running .\StartTTS.ps1. The app will automatically use them if config fields are missing or empty.

Important:

For Azure, your API key and region must match, and you must select a valid Azure voice or TTS generation will fail.

For AWS Polly, your access/secret keys and region must match, and you must select a valid Polly voice or TTS generation will fail.

For Google Cloud, your API key, region, and voice must be valid and match your project setup.

For Twilio, you must provide a valid account SID, auth token, and select a supported Twilio voice.

For VoiceForge, you must provide a valid API key and select a supported VoiceForge voice.

For VoiceWare, your API key must match the selected region, and you must select a valid VoiceWare voice or TTS generation will fail.

Provider	Details
Azure Cognitive Services	Status: Full production implementation with real API calls Quality: Premium neural voices with natural prosody and SSML support Free Tier: 5,000 transactions/month Languages: 140+ languages, 400+ voices Complete Setup Guide →
Google Cloud Text-to-Speech	Status: Full production implementation with real API calls Quality: WaveNet technology for human-like speech with advanced options Free Tier: 1M WaveNet characters/month Languages: 40+ languages, 220+ voices Complete Setup Guide →
AWS Polly	Status: Full production implementation with real API calls Quality: Neural and standard voices with AWS Signature V4 authentication Free Tier: 1M characters/month for speech synthesis Languages: 60+ languages, 570+ voices including neural options Complete Setup Guide →
Twilio	Status: Full production implementation with real API calls Quality: TTS integration within telephony and IVR workflows Features: TwiML generation, call API integration, multi-language support Languages: 11+ languages with voice selection across providers Complete Setup Guide →
VoiceForge	Status: Full production implementation with real API calls Quality: Character-style and novelty voices for creative applications Features: High-quality synthesis, SSML processing, multiple audio formats Languages: Multi-language support with specialised voice characters Complete Setup Guide →
VoiceWare	Status: Experimental integration Quality: Neural and expressive voices Features: SSML support, multiple audio formats, regional selection Languages: Multi-language support Complete Setup Guide →

Important:

Your VoiceWare API key must match the selected region, or authentication will fail.

You must select a valid VoiceWare voice in your configuration/profile, or TTS generation will fail. | VoiceWare | Status: Experimental integration
Quality: Neural and expressive voices
Features: SSML support, multiple audio formats, regional selection
Languages: Multi-language support
Complete Setup Guide → |

Usage Guide

Single Script Processing

Select Mode: Choose "Single-Script" radio button
Enter Text: Type your text in the input box
Configure Settings: Choose provider, voice, and output folder
Generate: Click "Go!" or press F5

Bulk Processing from CSV

Prepare CSV File: Create properly formatted CSV file
```
SCRIPT,FILENAME
"Hello world, this is a test.",test_audio_1
"Welcome to our service.",welcome_message
```
Complete CSV Format Guide → - Detailed specifications and examples
Select Mode: Choose "Bulk-Scripts" radio button
Load File: Click browse button or press Ctrl+O
Configure Settings: Set provider, voice, and output folder
Process: Click "Go!" or press F5

Keyboard Shortcuts

F5 or Ctrl+R: Start generation process
Ctrl+S: Save configuration
Ctrl+O: Open input file browser
Escape: Clear log window

Security Considerations

Secure Credential Storage

The application offers secure API key storage using Windows Credential Manager:

Keys are encrypted and stored securely by Windows
Plain text storage is avoided when possible
Automatic detection of stored credentials

Input Validation

CSV structure validation before processing
File path sanitization to prevent traversal attacks
HTML encoding for script content
API key format validation

File Structure

TextToSpeech-Generator/
├─ StartTTS.ps1                      # Main application launcher (v3.2+)
├─ config.json                              # Modern JSON configuration
├─ MigrateLegacyConfig.ps1                  # XML to JSON migration utility
├─ TextToSpeech-Generator.ps1               # Legacy GUI component (transitional)
├─ Modules/                                 # Modular architecture
│  ├─ Logging/Logging.psm1          # Enterprise logging system
│  ├─ Security/EnhancedSecurity.psm1        # Certificate-based encryption
│  ├─ Configuration/AdvancedConfiguration.psm1 # Multi-environment profiles
│  ├─ TTSProviders/TTSProviders.psm1        # Modular TTS provider implementations
│  ├─ Utilities/UtilityFunctions.psm1       # Supporting utility functions
│  ├─ ErrorRecovery/ErrorRecovery.psm1      # Intelligent error recovery strategies
│  └─ PerformanceMonitoring/PerformanceMonitoring.psm1 # Performance metrics & caching
├─ Tests/                                   # Comprehensive test suites
│  ├─ Unit/                                 # Unit tests for individual modules
│  ├─ Integration/                          # Integration tests for system components
│  └─ Performance/                          # Performance benchmarking tests
├─ README.md                                # Project overview (this file)
├─ licence                                  # MIT licence
├─ GUI-Timeline/                            # Development timeline screenshots
│  └─ 20210922 - Single-Mode File-Save issue.PNG

Default Configuration

Configuration is stored in config.json with provider-specific settings organised by environment profiles (Development/Production/Testing). Each provider section contains authentication, voice selection, and audio format preferences.

Audio Formats

Provider	Supported Formats	Default Format
Azure Cognitive Services	WAV, MP3, OGG, WEBM, FLAC	`riff-16khz-16bit-mono-pcm`
Google Cloud TTS	LINEAR16, MP3, OGG_OPUS, MULAW, ALAW	`LINEAR16`
AWS Polly	PCM, MP3, OGG_VORBIS, JSON	`mp3`
Twilio	WAV, MP3	`mp3`
VoiceForge	WAV, MP3, OGG	`mp3`
VoiceWare	WAV, MP3, OGG	`mp3`

Voice Options

Provider	Voice Count	Languages	Voice Types	Sample Voices
Azure Cognitive Services	400+ voices	140+ languages	Neural, Standard	en-US-JennyNeural, en-GB-RyanNeural, fr-FR-DeniseNeural
AWS Polly	570+ voices	60+ languages	Neural, Standard	Joanna, Matthew, Emma, Brian, Celine
Google Cloud TTS	220+ voices	40+ languages	WaveNet, Neural2, Standard	en-US-Wavenet-D, en-GB-Neural2-A, fr-FR-Wavenet-E
Twilio	50+ voices	11+ languages	Telephony-optimised	alice, man, woman (provider-specific)
VoiceForge	200+ voices	15+ languages	Character, Novelty	Robot, Alien, Wizard, Princess, Monster
VoiceWare	100+ voices	20+ languages	Neural, Expressive	en-US-Standard, en-GB-Premium, fr-FR-Neural

Troubleshooting

Quick Fixes

Authentication Errors:

Verify API key is correct and active
Check datacenter region matches your subscription
See provider-specific setup: AWS Polly | ElevenLabs | Google Cloud | [Microsoft Azure](docs/providers/Microsoft Azure.md) | Murf AI | Twilio

File Processing Errors:

Validate CSV format - see CSV Format Guide
Check output folder write permissions
Verify file paths don't contain invalid characters

Network Issues:

Check internet connectivity
Verify firewall isn't blocking HTTPS requests
Try different datacenter region if available

Complete Troubleshooting Guide → - Comprehensive problem-solving resource

Logging

Application logs are saved to application.log in the application directory:

2025-10-10 14:30:15 [INFO] Starting TextToSpeech Generator v3.2
2025-10-10 14:30:20 [INFO] Loaded 400 voices from Azure Cognitive Services
2025-10-10 14:30:45 [INFO] Generated: welcome_message
2025-10-10 14:30:50 [ERROR] Authentication failed: Invalid API key

Log levels: INFO, WARNING, ERROR, DEBUG

Contributing

Contributions are welcome! Please read our contributing guidelines:

Fork the repository
Create a feature branch
Make your changes with proper documentation
Test thoroughly
Submit a pull request

Authors

Luca Vitali - Original concept and implementation
Simon Jackson - Enhanced security, error handling, and additional features

Acknowledgments

Azure Cognitive Services team
Google Cloud Text-to-Speech team
PowerShell community for WPF guidance

Support

Issues: Please log Issues using -> GitHub Issues
Documentation: README.md
Email: NOT AVAILABLE FOR THIS PROJECT

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

TextToSpeech Generator v3.2

Attribution & licence

Features

Requirements

Quick Preview

Installation

New Installation (v3.2+)

Upgrading from v3.1 or Earlier

API Configuration

Supported Environment Variables

Usage Guide

Single Script Processing

Bulk Processing from CSV

Keyboard Shortcuts

Security Considerations

Secure Credential Storage

Input Validation

File Structure

Default Configuration

Audio Formats

Voice Options

Troubleshooting

Quick Fixes

Logging

Contributing

Authors

Acknowledgments

Support

About

Uh oh!

Releases

Sponsor this project

Uh oh!

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github		.github
.vscode		.vscode
Modules		Modules
Tests		Tests
XAML		XAML
docs		docs
examples		examples
providers		providers
screenshots		screenshots
.gitignore		.gitignore
ATTRIBUTION.md		ATTRIBUTION.md
CHANGELOG.md		CHANGELOG.md
Convert-ToUKEnglish.ps1		Convert-ToUKEnglish.ps1
Default.json		Default.json
LICENSE		LICENSE
README.md		README.md
StartTTS.ps1		StartTTS.ps1

Uh oh!

License

sjackson0109/TextToSpeech-Generator

Folders and files

Latest commit

History

Repository files navigation

TextToSpeech Generator v3.2

Attribution & licence

Features

Requirements

Quick Preview

Installation

New Installation (v3.2+)

Upgrading from v3.1 or Earlier

API Configuration

Supported Environment Variables

Usage Guide

Single Script Processing

Bulk Processing from CSV

Keyboard Shortcuts

Security Considerations

Secure Credential Storage

Input Validation

File Structure

Default Configuration

Audio Formats

Voice Options

Troubleshooting

Quick Fixes

Logging

Contributing

Authors

Acknowledgments

Support

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Sponsor this project

Uh oh!

Packages 0

Languages

Packages