Skip to content

Comments

feat: add initial prompt (prompt_ids) support for Whisper generation#1540

Open
jhlee111 wants to merge 1 commit intohuggingface:mainfrom
jhlee111:feat/whisper-prompt-ids
Open

feat: add initial prompt (prompt_ids) support for Whisper generation#1540
jhlee111 wants to merge 1 commit intohuggingface:mainfrom
jhlee111:feat/whisper-prompt-ids

Conversation

@jhlee111
Copy link

Closes #923
Closes #1028

Summary

Add prompt_ids support to WhisperForConditionalGeneration.generate(), enabling initial prompt conditioning for Whisper transcription. This is a long-requested feature (both issues are from 2024) that matches the behavior of the Python transformers library.

What this does

When prompt_ids (an array of token IDs, typically starting with <|startofprev|>) is provided via generation config, it is prepended to init_tokens following the Whisper training format:

[<|startofprev|>, ...prompt_text..., <|startoftranscript|>, <|lang|>, <|task|>, ...]

After generation, the prompt tokens are stripped from the output sequences so they don't appear in the transcription results.

Usage

// Encode prompt (e.g., domain-specific terms)
const prompt_ids = tokenizer.encode("<|startofprev|> " + text, { add_special_tokens: false });

// Pass to pipeline or model.generate()
const output = await model.generate({
  inputs: features,
  prompt_ids,
  language: "en",
  return_timestamps: true,
});

Changes

  • packages/transformers/src/models/whisper/modeling_whisper.js — 1 file, ~20 lines added
    • Prepend prompt_ids to init_tokens when provided
    • Strip prompt tokens from output sequences after generation

No breaking changes — when prompt_ids is not provided, behavior is identical to before.

Checklist

  • Build passes (pnpm build)
  • Prettier formatting passes (pnpm format:check)
  • No breaking changes to existing functionality
  • Tests — no existing Whisper generation tests found in the repo; happy to add if desired

Implement prompt_ids handling in WhisperForConditionalGeneration.generate()
to support initial prompt conditioning, matching the Python transformers
library behavior.

When prompt_ids is provided via generation config, it is prepended to
init_tokens following the Whisper training format:
[<|startofprev|>, ...prompt_text..., <|startoftranscript|>, <|lang|>, <|task|>, ...]

The prompt tokens are stripped from output sequences after generation
to prevent them from appearing in transcription results.

Closes huggingface#923
Closes huggingface#1028
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Supports Whisper prompt and prefix WhisperModel initial_prompt

1 participant