-
Notifications
You must be signed in to change notification settings - Fork 18
feat: Add comprehensive Azure MSTTS support with automatic namespace injection #105
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add comprehensive Azure MSTTS support with automatic namespace injection #105
Conversation
- Implement automatic detection of MSTTS tags in generated SSML - Conditionally inject xmlns:mstts namespace only when needed - Override addSpeakTag() in MicrosoftAzureSsmlFormatter - Add containsMsttsTag() helper method with regex detection - Update test expectations for newscaster feature - All 657 tests passing
- Implement excited, disappointed, friendly, cheerful, sad, angry, fearful, empathetic, calm, lyrical, hopeful, shouting, whispering, terrified, unfriendly, gentle, serious, depressed, embarrassed, affectionate, envious, chat, cheerful, customerservice styles - Add styledegree attribute support (0.01-2.0 range) with validation - Update test expectations for Azure's behavior with invalid values - All 669 tests passing
…erage - Document all 27 express-as styles (emotional and scenario-specific) - Add styledegree attribute documentation with examples - Document automatic namespace injection feature - Add Azure example to main README showcasing express-as with styledegree - Note unsupported features (role, mstts:silence, etc.) with workarounds - Update platform documentation to reflect current implementation
…atforms - Compare Azure's 27 express-as styles vs Alexa's 2 emotions and Google's 0 - Highlight Azure's numeric intensity control (0.01-2.0) vs Alexa's 3 levels - Document automatic namespace injection advantage - Show Azure has most comprehensive emotional/stylistic control - List advantages and parity for each platform comparison
- Add all Azure styles to textModifierKey and sectionModifierKey in grammar - Update MicrosoftAzureSsmlFormatter to handle all 27 styles in both text and section modifiers - Add special handling for newscaster -> newscast style mapping - Include poetry-reading, narration-professional, newscast-casual styles - All 669 tests passing including live Azure MSTTS validation
|
Note. WIP. Let me check after Claude done a ton of this |
- Add advertisement_upbeat, documentary-narration, narration-relaxed, newscast-formal, sports_commentary, sports_commentary_excited styles - Implement lang modifier support for Azure platform (xml:lang attribute) - Update test expectations for Azure lang support - Update documentation with all 33 Azure styles - Document multi-speaker dialog (mstts:dialog/mstts:turn) and role attributes as requiring raw SSML - Add .env to .gitignore for security - Total Azure styles now 33 (up from 27) - All 669 tests passing
- Add detailed support matrix table showing all Azure SSML elements - Document which elements are fully supported, partially supported, or not supported - Reorganize unsupported features section with clear explanations - Add workarounds for each unsupported feature - Clarify why certain features are disabled (emphasis, expletive, interjection, unit) - Document all advanced MSTTS features and their support status - Improve documentation structure and clarity
- Enable emphasis element with all 4 levels (moderate, strong, reduced, none) - Add bookmark support (generates <bookmark mark='...'> for Azure SDK) - Update all tests to expect proper SSML tags - Update documentation to reflect correct support status - All 669 tests passing
…ttributes - Add style and role keywords to grammar - Implement semicolon-delimited multiple attribute syntax - Refactor Azure formatter to collect and combine express-as attributes - Add comprehensive tests for style+role combinations - Update documentation with role attribute examples - All 672 tests passing
|
Ready for review now @arjan - this gives way more feature support to azure tts and its various intricacies.. I'd say its more feature packed now than any others.. |
… metadata - Update voice data script to include voice metadata for downstream uses - Add id, displayName/name, and languages/language/locale fields - Maintain backward compatibility with voice.name for SSML generation - Filter metadata fields in voiceTagNamed to prevent invalid SSML tags - Regenerate all voice data files (Azure, Google, Polly, Watson) - All 672 tests passing
- Updated Azure formatter to use voiceTag() consistently for voice lookups - Added getVoiceTagFallback() method to Azure formatter for unknown voices - Voice data now supports lookup by both display name (e.g., 'Jenny') and voice ID (e.g., 'en-US-JennyNeural') - SSML output always uses the correct voice ID from the catalog - Added comprehensive tests for display name lookup functionality - Updated existing tests to reflect new voice ID resolution behavior - All 677 tests passing
- Created azure-comprehensive.spec.ts with 10 test cases covering Azure TTS features - Tests include: bookmarks, style/degree, role adjustments, language changes, pitch, emphasis, and audio - 7 tests passing, 3 skipped (voice names with colons and effect attribute not yet supported) - All 684 existing tests still passing - Verified Speech Markdown correctly generates Azure SSML for common use cases
- Changed voice names from HD format (en-US-Ava:DragonHDLatestNeural) to standard neural format (en-US-AvaNeural) - HD voices with colon syntax are a separate Azure feature not currently supported by Speech Markdown parser - Updated 'Simple azure Voice name' test to use en-US-AvaNeural - Updated 'Multi Voices' test to use en-US-AvaNeural and en-US-AndrewNeural - Fixed XML entity escaping expectation (& not & in actual output) - Fixed whitespace formatting in multi-voice test - Now 9 tests passing, 1 skipped (audio effects) - All 686 existing tests still passing
- Added 30 Azure HD voices to voice data with dash syntax (e.g., en-US-Ava-DragonHDLatestNeural) - HD voices use dash syntax in Speech Markdown, converted to colon syntax in SSML (e.g., en-US-Ava:DragonHDLatestNeural) - Added isHD metadata field to voice entries and filtered it from SSML output - Updated comprehensive tests to use HD voices - All 686 tests passing (1 skipped) HD voices are premium high-definition voices with enhanced features: - Human-like speech generation with automatic emotion detection - Conversational patterns with natural pauses - Prosody variations for realism - Higher fidelity audio
- Created comprehensive test suite for Google Cloud TTS with 17 tests - Added support for google:style tag - Maps to google:style SSML tag - No namespace declaration needed per Google documentation - 16 tests passing, 1 skipped (voice sections not yet supported) - All 702 existing tests still passing
|
So note.. We've fixed and tested quite a bit in this - we've added tests directy using SSML snippets from azure and google cloud docs.. We've also done quite a bit to add langs to the voice lists and voice-id so a user can use either id or name and we replace it correctly in the ssml with the id |
|
This looks pretty "comprehensive" indeed Will! Nice work. |
|
TY @arjan - I'll just keep an eye on a release.. (NB: ive been working on a PR for the editor - I'll hold off doing much more on that till released.. I have grand plans for that but will prolly keep the "grand" plans for a completely seperate PR you may not want to release! second NB: https://github.com/willwade/js-tts-wrapper - is a wrapper supporting speechmarkdown across as many TTS systems as possible.. we can give live preview of the output..) |
|
Release done ✅ |
Overview
This PR adds comprehensive Azure MSTTS (Microsoft Text-to-Speech) support to Speech Markdown, including automatic namespace injection, 33 express-as styles with intensity control, and language switching support.
Key Features
1. Automatic Azure SSML Namespace Injection
<mstts:express-as>) are present in generated SSMLxmlns:mstts="https://www.w3.org/2001/mstts"namespace when needed2. Complete Express-As Style Support (33 styles)
3. Style Degree (Intensity Control)
(text)[excited:"1.5"]generates<mstts:express-as style="excited" styledegree="1.5">4. Language Switching Support
langmodifier:(Paris)[lang:"fr-FR"]<lang xml:lang="fr-FR">Paris</lang>5. Docs
docs/platforms/azure.mdwith all 33 stylesmstts:dialogandmstts:turn) with raw SSML examplesExamples
Basic Express-As Style
Generates:
Style with Intensity
Generates:
Language Switching
Generates:
Section-Level Style
Generates:
Ready for review! This PR brings Azure MSTTS support from 27 to 33 styles, adds language switching, and provides comprehensive documentation for all Azure-specific features.