Fix compiler toolkit CI by removing duplicated buffer registration#2435
Fix compiler toolkit CI by removing duplicated buffer registration#2435yiming0416 wants to merge 1 commit intomainfrom
Conversation
|
@tianyu-l This fixes the compiler toolkit CI failure after your config system change. let me know if you are okay with changing the model code. Otherwise I can fix it from the compiler toolkit experiment. |
|
@wconstab Originally the cache was registered as a non-persistent buffer, which won't appear in state_dict, so I assume it shouldn't affect checkpointing? |
|
@fegin could you take a look? thanks! |
fegin
left a comment
There was a problem hiding this comment.
LGTM, I don't think checkpointing is a problem.
| self.config = config | ||
| # Buffer registered later in init_weights | ||
| self.register_buffer("cache", self._precompute(), persistent=False) | ||
| self.cache: torch.Tensor = self._precompute() |
There was a problem hiding this comment.
one other consideration, iiuc when registered as a non-persisten buffer, 'cache' will at least be 'moved' to device when the module is moved .to(device). Having it as a plain tensor will not do this. Will this break our initialization flow (starting as 'meta' and moving to 'cuda' for example?)
self.rope.cacheandself.freqs_cisare the same tensor object registered as buffers on two different modules. Tracing would see them as two distinct graph inputs for the same underlying data.This PR removes the
register_bufferfrom RoPE and just store cache as a plain tensor attribute there, keeping only the Decoder-levelregister_buffer("freqs_cis", ...).This fixes the compiler toolkit CI: