Skip to content

Conversation

@ggerganov
Copy link
Member

@ggerganov ggerganov commented Feb 2, 2026

fix #19231

For the spec-simple method, we don't need to keep track of the last length to rate-limit the generations. We can simply use an incremental counter. This makes the speculator work with "Regenerate" of last message or branching the conversation from previous messages.

Also, removed struct common_ngram_simple_state - seemed a bit redundant.

@ggerganov ggerganov force-pushed the gg/spec-simple-freq-check branch from dee323f to b3fa165 Compare February 2, 2026 07:04
@easyfab
Copy link

easyfab commented Feb 2, 2026

Thank you very much, it works perfectly now with this fix.

@ggerganov ggerganov requested a review from srogmann February 3, 2026 06:19
Copy link
Collaborator

@srogmann srogmann left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the config parameter --spec-ngram-check-rate completely? This parameter was introduced when we didn't have the hash maps in ngram-map-* and ngram-mod. The ngram-simple implementation would get a bit simpler (less risk of bugs like in #19231 ).

@ggerganov ggerganov merged commit d838c22 into master Feb 4, 2026
74 of 78 checks passed
@ggerganov
Copy link
Member Author

Should we remove the config parameter --spec-ngram-check-rate completely?

Yes, let's remove it. Feel free to PR the change.

@ggerganov ggerganov deleted the gg/spec-simple-freq-check branch February 4, 2026 08:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Misc. bug: Speculative decoding only works once with /v1/chat/completions

4 participants