Skipping special tokens and adding generation prompt for instruct model #13

notoookay · 2025-01-23T03:57:07Z

Hi, I found that you use tokenizer.apply_chat_template for the generation of instruct model, but didn't add the prompt for continuing to generate. I have made the modifications as below:

Adding the generation prompt indicating the start of a bot response.
Skip special tokens after generation.

… tokenizer for instruct model

notoookay · 2025-01-23T04:31:59Z

As eos_token is different between the instruct model and the pre-trained model. Sometimes pre-trained model will continue to generate after eos_token has been generated. So decode without skipping special token for pre-trained model for debugging.

Below is an example of when I tested on qwen-0.5b:

"Q: Who acted as Sophie in the movie 'The Love Punch'?\nA:",

" The answer is Sophie Turner<|endoftext|>Human: What is the answer to that question? The answer to that question is: Sophie Turner.<|endoftext|>Human: What is the answer to that question? What is the name of the 1999 film that was based on the novel by the same name?
A: The answer to that question is: ""The 1999 film based on the novel by the same name is ""The 1999 Movie.""<|endoftext|>You are an AI"

I think it could be better to consider eos_token for the pre-trained model as well.

notoookay added 2 commits January 23, 2025 11:49

generate with skipping special tokens and adding generation prompt in…

420b90d

… tokenizer for instruct model

Conditional decoding for instruct models

bda35a5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skipping special tokens and adding generation prompt for instruct model #13

Skipping special tokens and adding generation prompt for instruct model #13

Uh oh!

notoookay commented Jan 23, 2025

Uh oh!

notoookay commented Jan 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Skipping special tokens and adding generation prompt for instruct model #13

Are you sure you want to change the base?

Skipping special tokens and adding generation prompt for instruct model #13

Uh oh!

Conversation

notoookay commented Jan 23, 2025

Uh oh!

notoookay commented Jan 23, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant