Skip to content

Conversation

@litmudoc
Copy link
Contributor

@litmudoc litmudoc commented Jan 4, 2026

Context

The Chatterbox TTS model supports multilingual text-to-speech, but the tokenizer selection code was not properly configuring MTLTokenizer for multilingual models. When a model with "multilingual": true in its config..json was loaded, it would incorrectly use the English tokenizer (EnTokenizer) instead of the multilingual tokenizer.

Description

Modified _init_tokenizers() and from_pretrained(), post_load_hook() methods in chatterbox.py to:

  1. Check for the existence of config.json in the model directory
  2. Read the "multilingual" configuration flag from JSON config
  3. Select and instantiate MTLTokenizer when multilingual is enabled, or EnTokenizer for English-only models
  4. Added appropriate console logging messages indicating which tokenizer was loaded

Changes in the codebase

  • Tokenizer Selection Logic: Added conditional logic to check config.["multilingual"] and instantiate the appropriate tokenizer class
  • Import Updates: Added MTLTokenizer to imports alongside existing EnTokenizer
  • User Feedback: Added print statements showing which tokenizer was loaded ("Loaded multilingual tokenizer (MTLTokenizer)" or "Loaded English tokenizer (EnTokenizer)")
  • Three Locations Modified: _init_tokenizers(), load() method for S3 checkpoints, and load() method for model loading

Additional information

This change enables Chatterbox multilingual TTS models to properly tokenize input text, which is required for text-to-speech generation in multiple languages. The implementation follows the existing pattern of checking model configuration files.

Checklist

  • Code tested with multilingual Chatterbox models
  • Documentation updated for multilingual model usage
  • No breaking changes to existing English-only models

Copy link
Contributor Author

@litmudoc litmudoc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to use MTLTokenizer for Chatterbox Multilingual support. Thank you.

litmudoc

This comment was marked as duplicate.

@litmudoc litmudoc changed the title use MTLTokenizer for multilingual. use chatterbox MTLTokenizer for multilingual. Jan 4, 2026
@Blaizzy
Copy link
Owner

Blaizzy commented Jan 4, 2026

Hey @litmudoc

Awesome, do you have any audio samples .?

@litmudoc
Copy link
Contributor Author

litmudoc commented Jan 5, 2026

mlx_audio.tts.generate \
    --model litmudoc/Chatterbox-Multilingual-MLX-v2-fp16 \
    --exaggeration 0.6 \
    --cfg_scale 0.35 \
    --temperature 0.8 \
    --lang_code ko \
    --text ", 한국말이 너무 자연스러워요\! 감격하고, 또 감격 했습니다." \
    --ref_audio ko.wav \
    --ref_text "우리는 정말로 허름한 호텔에 묵었지만, 그래도 행복했다." \
    --verbose --play

This is a sample created with the Chatterbox multilingual model on local Mac.
audio_000.wav

@litmudoc litmudoc marked this pull request as draft January 5, 2026 14:34
@litmudoc litmudoc marked this pull request as ready for review January 5, 2026 15:13
Copy link
Owner

@Blaizzy Blaizzy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@Blaizzy Blaizzy force-pushed the Edit-chatterbox-can-use-MTLTokenizer branch from 6b92122 to f3c387d Compare January 5, 2026 17:53
@Blaizzy Blaizzy merged commit 9220f2c into Blaizzy:main Jan 5, 2026
10 checks passed
@litmudoc litmudoc deleted the Edit-chatterbox-can-use-MTLTokenizer branch January 6, 2026 02:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants