feat: add language auto-detection for Whisper multilingual models#1541
Open
jhlee111 wants to merge 1 commit intohuggingface:mainfrom
Open
feat: add language auto-detection for Whisper multilingual models#1541jhlee111 wants to merge 1 commit intohuggingface:mainfrom
jhlee111 wants to merge 1 commit intohuggingface:mainfrom
Conversation
Implement detect_language() on WhisperForConditionalGeneration that runs a single encoder-decoder forward pass with just the <|startoftranscript|> token, masks logits to language tokens only, and returns the argmax. When language is not specified on a multilingual model, generate() now calls detect_language() automatically instead of defaulting to English. Closes huggingface#302
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #302
Summary
Implement automatic language detection for Whisper multilingual models. Previously, when no language was specified, the model silently defaulted to English. Now it runs a detection forward pass matching the Python
transformerslibrary behavior.How it works
detect_language(input_features, generation_config)runs a single forward pass:<|startoftranscript|>to the decodergeneration_config.lang_to_id)generate()callsdetect_language()automatically when:is_multilingualis truelanguageis specifiedinputs(audio features) are availableThe detected language is set on
generation_config.language, so_retrieve_init_tokens()picks it up normally. The init_tokens structure stays the same (SOT + lang + task + notimestamps), just with the detected language instead of hardcoded English.Usage
Changes
packages/transformers/src/models/whisper/modeling_whisper.js— 1 filedetect_language()method (~40 lines)generate()when language is not specified (~10 lines)// TODO: Implement language detectioncommentNo breaking changes — when
languageis explicitly provided, behavior is identical to before.Checklist
pnpm build)pnpm format:check)