Skip to content

Comments

feat: add language auto-detection for Whisper multilingual models#1541

Open
jhlee111 wants to merge 1 commit intohuggingface:mainfrom
jhlee111:feat/whisper-language-detection
Open

feat: add language auto-detection for Whisper multilingual models#1541
jhlee111 wants to merge 1 commit intohuggingface:mainfrom
jhlee111:feat/whisper-language-detection

Conversation

@jhlee111
Copy link

Closes #302

Summary

Implement automatic language detection for Whisper multilingual models. Previously, when no language was specified, the model silently defaulted to English. Now it runs a detection forward pass matching the Python transformers library behavior.

How it works

  1. detect_language(input_features, generation_config) runs a single forward pass:

    • Encodes the audio through the encoder
    • Feeds only <|startoftranscript|> to the decoder
    • Masks logits to language token IDs only (from generation_config.lang_to_id)
    • Returns the argmax — the most likely language token ID
  2. generate() calls detect_language() automatically when:

    • is_multilingual is true
    • No language is specified
    • inputs (audio features) are available

The detected language is set on generation_config.language, so _retrieve_init_tokens() picks it up normally. The init_tokens structure stays the same (SOT + lang + task + notimestamps), just with the detected language instead of hardcoded English.

Usage

// Auto-detect (no language specified)
const output = await model.generate({
  inputs: features,
  return_timestamps: true,
});

// Or call detect_language() directly
const langTokenId = await model.detect_language(features, generation_config);

Changes

  • packages/transformers/src/models/whisper/modeling_whisper.js — 1 file
    • Add detect_language() method (~40 lines)
    • Call it from generate() when language is not specified (~10 lines)
    • Remove // TODO: Implement language detection comment

No breaking changes — when language is explicitly provided, behavior is identical to before.

Checklist

  • Build passes (pnpm build)
  • TypeScript type check passes
  • Prettier formatting passes (pnpm format:check)
  • No breaking changes to existing functionality
  • Tests — happy to add if desired

Implement detect_language() on WhisperForConditionalGeneration that runs
a single encoder-decoder forward pass with just the <|startoftranscript|>
token, masks logits to language tokens only, and returns the argmax.

When language is not specified on a multilingual model, generate() now
calls detect_language() automatically instead of defaulting to English.

Closes huggingface#302
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature request] Whisper Language Detection

1 participant