fix: only reset LoRa configs when they have changed from previous batch by agent-enemy-2 · Pull Request #19280 · ggml-org/llama.cpp

agent-enemy-2 · 2026-02-03T01:56:27Z

Overview

Fix for #19217

Currently we are setting the LoRa config for each and every token request, even if the configuration between batches has not changed. It appears in this PR, in llama-context.cpp, line 1078 we added schedule reserving for when we set new LoRa configs, and so now when we have a LoRa config, for every batch we are decoding, we are reserving the scheduler.

This change adds a field to the server slot that holds the last batche's LoRa config. For each batch, we run a check to see if the config is the same as the previous batch, and only if it is not do we go ahead and set it. Otherwise, proceed.

Testing

I was able to re-produce the issue relatively easily with some debug logs

./llama-server -hf bartowski/Meta-Llama-3.1-8B-Instruct-GGUF:Q8_0 --lora ~/Downloads/LoRA-Llama-3.1-8B-MultiReflection-f16.gguf -c 1024 --parallel 1 --host 0.0.0.0 --port 8080 --verbose

and was able to see this endlessly along with huge memory spikes:

Setting sched_need_reserve to TRUE in the set_adapter_lora functionset_embeddings: value = 0
Reserving memory during decoding

sched_reserve: reserving ...
sched_reserve: max_nodes = 3692
srv    operator(): http: streamed chunk: data: {"choices":[{"finish_reason":null,"index":0,"delta":{"content":" requests"}}],"created":1770077607,"id":"chatcmpl-1gqsKPcqCDHRRMZFNqfWqnl557ARXVc9","model":"bartowski/Meta-Llama-3.1-8B-Instruct-GGUF:Q8_0","system_fingerprint":"b7916-0dfcd3b60","object":"chat.completion.chunk","timings":{"cache_n":0,"prompt_n":109,"prompt_ms":628.118,"prompt_per_token_ms":5.762550458715597,"prompt_per_second":173.5342722227352,"predicted_n":2,"predicted_ms":180.088,"predicted_per_token_ms":90.044,"predicted_per_second":11.105681666740704}}


sched_reserve: reserving full memory module
sched_reserve: worst-case: n_tokens = 512, n_seqs = 1, n_outputs = 1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
graph_reserve: reserving a graph for ubatch with n_tokens =    1, n_seqs =  1, n_outputs =    1
graph_reserve: reserving a graph for ubatch with n_tokens =  512, n_seqs =  1, n_outputs =  512
sched_reserve:       MTL0 compute buffer size =   509.12 MiB
sched_reserve:        CPU compute buffer size =    18.01 MiB
sched_reserve: graph nodes  = 1903
sched_reserve: graph splits = 2

After these changes, the issue no longer appears and memory remains stable.

ggerganov

Better to move the check inside llama_context::set_adapter_lora() and not set the sched_need_reserve flag if the same adapters are passed.

fix: updating LoRa config check to the common.cpp Removing dangling unused import

agent-enemy-2 · 2026-02-04T05:38:07Z

Had to implement the check in the common_set_adapter_lora because in the method we were clearing the loras every time, and then setting sched_need_reserve flag as well.

Added some helpers to validate if the lora configs are the same, and if they are, skip clearing and skip reserving the scheduler.

ggerganov · 2026-02-04T10:43:28Z

Had to implement the check in the common_set_adapter_lora because in the method we were clearing the loras every time, and then setting sched_need_reserve flag as well.

Can't we update the logic in llama_set_adapter_lora() to first check if they are the same and if they are - not do anything. We already have such logic in other functions, such as:

llama.cpp/src/llama-context.cpp

Lines 991 to 1001 in 6ab881b

    
           void llama_context::set_causal_attn(bool value) { 
        
               LLAMA_LOG_DEBUG("%s: value = %d\n", __func__, value); 
        
               if (cparams.causal_attn == value) { 
        
                   return; 
        
               } 
        
               cparams.causal_attn = value; 
        
               sched_need_reserve = true; 
        
           }

agent-enemy-2 · 2026-02-04T13:10:19Z

Apologies because I might be missing something.

In the common_set_adapter_lora function, we call llama_clear_adapter_lora which would prevent any checks in lama_context::set_adapter_lora from being able to tell if the LoRa adapter being passed in is the same as what is currently in the context, since there would be no current adapter.

We could just remove that call to clear the LoRas (or move that to the llama_context::set_adapter_lora`, but I wasn't sure if there is some reason we wanted that in the common functionality.

My latest change attempts to do what you're looking for, just in the common_set_adapter_lora function, prior to clearing the context LoRas.

agent-enemy-2 requested review from ggerganov and ngxson as code owners February 3, 2026 01:56

loci-dev mentioned this pull request Feb 3, 2026

UPSTREAM PR #19280: fix: only reset LoRa configs when they have changed from previous batch auroralabs-loci/llama.cpp#1142

Open

github-actions bot added examples server labels Feb 3, 2026

ggerganov reviewed Feb 3, 2026

View reviewed changes

agent-enemy-2 requested review from 0cc4m, CISC, JohannesGaessler, lhez and max-krasnyansky as code owners February 4, 2026 05:16

agent-enemy-2 force-pushed the bug-fix/memory-leak branch 2 times, most recently from e810f4d to 8edc85d Compare February 4, 2026 05:34

fix: only reset LoRa configs when they have changed from previous batch

b131ad4

fix: updating LoRa config check to the common.cpp Removing dangling unused import

agent-enemy-2 force-pushed the bug-fix/memory-leak branch from 8edc85d to b131ad4 Compare February 4, 2026 05:36

CISC removed request for 0cc4m, CISC, JohannesGaessler, lhez and max-krasnyansky February 4, 2026 08:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: only reset LoRa configs when they have changed from previous batch#19280

fix: only reset LoRa configs when they have changed from previous batch#19280
agent-enemy-2 wants to merge 1 commit intoggml-org:masterfrom
agent-enemy-2:bug-fix/memory-leak

agent-enemy-2 commented Feb 3, 2026 •

edited

Loading

Uh oh!

ggerganov left a comment

Uh oh!

agent-enemy-2 commented Feb 4, 2026 •

edited

Loading

Uh oh!

ggerganov commented Feb 4, 2026 •

edited

Loading

Uh oh!

agent-enemy-2 commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

agent-enemy-2 commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Testing

Uh oh!

ggerganov left a comment

Choose a reason for hiding this comment

Uh oh!

agent-enemy-2 commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ggerganov commented Feb 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agent-enemy-2 commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

agent-enemy-2 commented Feb 3, 2026 •

edited

Loading

agent-enemy-2 commented Feb 4, 2026 •

edited

Loading

ggerganov commented Feb 4, 2026 •

edited

Loading