Feat: Adding token healing support for auto complete #19238

agent-enemy-2 · 2026-02-01T04:03:31Z

Hey all. I'm brand new to this repo and saw this issue was sitting in the backlog for a while. Looks like it had some traction and a potential draft PR, but nothing has happened on it since mid last year. I figured I'd get my feet wet with it.

This change adds some token handling support for the /completion server endpoint. The idea is simple. When enabled (by providing n_token_healing_enabled parameter in API requests), the server will remove the last token from the input (calling it the healer token) that it sends to the model. It then compares the output tokens when sampling to ensure that the first output token has the same prefix as the healer token. This ensures that if there's any cutoff when doing auto complete that it won't interrupt the flow of the user.

Some example tests I've run:

Input:

curl -X POST http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Five, Four, Thr",
    "n_token_healing_enabled": true,
    "n_token_healing_max_retries": 5,
    "n_predict": 5,
    "stop": ["\n", "."]
  }' 2>/dev/null | jq -r '.content'

Output:
ee, Two, One

Input:

curl -X POST http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "My favorite rock band is Green Da",
    "n_token_healing_enabled": true,
    "n_token_healing_max_retries": 5,
    "n_predict": 5,
    "stop": ["\n", "."]
  }' 2>/dev/null | jq -r '.content'

Output:
y

Input:
curl -X POST http://localhost:8080/completion \
  -H "Content-Type: application/json" \
  -d '{
    "prompt": "Green Days best album was Ameri",
    "n_token_healing_enabled": true,
    "n_token_healing_max_retries": 5,
    "n_predict": 5,
    "stop": ["\n", "."]
  }' 2>/dev/null | jq -r '.content'

Output:
can Idiot

A decision I had to make here was whether to throw this in the server logic itself or to create a new sampler function. I think due to the fact that only auto-complete tools would be using this, it makes sense to be inside the server logic, but do feel free to tell me this was not the right place to put it and any recommendations you might have with this or any other aspects of the code itself.

jchavis06 added 2 commits January 31, 2026 22:09

feat: add token healing support

19b67ed

Minor tweaks

a67f73c

agent-enemy-2 requested review from ggerganov and ngxson as code owners February 1, 2026 04:03

github-actions bot added examples server labels Feb 1, 2026

agent-enemy-2 mentioned this pull request Feb 1, 2026

server : add "token healing" support #5765

Open

4 tasks

loci-dev mentioned this pull request Feb 1, 2026

UPSTREAM PR #19238: Feat: Adding token handling auto complete auroralabs-loci/llama.cpp#1116

Open

agent-enemy-2 changed the title ~~Feat: Adding token handling auto complete~~ Feat: Adding token healing support for auto complete Feb 1, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feat: Adding token healing support for auto complete #19238

Feat: Adding token healing support for auto complete #19238

agent-enemy-2 commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Feat: Adding token healing support for auto complete #19238

Are you sure you want to change the base?

Feat: Adding token healing support for auto complete #19238

Conversation

agent-enemy-2 commented Feb 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants