Feat: Adding token healing support for auto complete #19238
+89
−1
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hey all. I'm brand new to this repo and saw this issue was sitting in the backlog for a while. Looks like it had some traction and a potential draft PR, but nothing has happened on it since mid last year. I figured I'd get my feet wet with it.
This change adds some token handling support for the
/completionserver endpoint. The idea is simple. When enabled (by providingn_token_healing_enabledparameter in API requests), the server will remove the last token from the input (calling it thehealer token) that it sends to the model. It then compares the output tokens when sampling to ensure that the first output token has the same prefix as thehealer token. This ensures that if there's any cutoff when doing auto complete that it won't interrupt the flow of the user.Some example tests I've run:
A decision I had to make here was whether to throw this in the server logic itself or to create a new sampler function. I think due to the fact that only auto-complete tools would be using this, it makes sense to be inside the server logic, but do feel free to tell me this was not the right place to put it and any recommendations you might have with this or any other aspects of the code itself.