fix: only reset LoRa configs when they have changed from previous batch#19280
fix: only reset LoRa configs when they have changed from previous batch#19280agent-enemy-2 wants to merge 1 commit intoggml-org:masterfrom
Conversation
ggerganov
left a comment
There was a problem hiding this comment.
Better to move the check inside llama_context::set_adapter_lora() and not set the sched_need_reserve flag if the same adapters are passed.
e810f4d to
8edc85d
Compare
fix: updating LoRa config check to the common.cpp Removing dangling unused import
8edc85d to
b131ad4
Compare
|
Had to implement the check in the Added some helpers to validate if the lora configs are the same, and if they are, skip clearing and skip reserving the scheduler. |
Can't we update the logic in llama.cpp/src/llama-context.cpp Lines 991 to 1001 in 6ab881b |
|
Apologies because I might be missing something. In the common_set_adapter_lora function, we call We could just remove that call to clear the LoRas (or move that to the llama_context::set_adapter_lora`, but I wasn't sure if there is some reason we wanted that in the common functionality. My latest change attempts to do what you're looking for, just in the |
Overview
Fix for #19217
Currently we are setting the LoRa config for each and every token request, even if the configuration between batches has not changed. It appears in this PR, in llama-context.cpp, line 1078 we added schedule reserving for when we set new LoRa configs, and so now when we have a LoRa config, for every batch we are decoding, we are reserving the scheduler.
This change adds a field to the server slot that holds the last batche's LoRa config. For each batch, we run a check to see if the config is the same as the previous batch, and only if it is not do we go ahead and set it. Otherwise, proceed.
Testing
I was able to re-produce the issue relatively easily with some debug logs
and was able to see this endlessly along with huge memory spikes:
After these changes, the issue no longer appears and memory remains stable.