Make tensor of indices a LongTensor#2431
Conversation
Indices get multiplied with strides in operations like aten::index_put. A 32-bit index can lead to a silent overflow, since the Torch Inductor does not do any overflow checking on its own. This commit fixes GitHub issue pytorch#2430
tianyu-l
left a comment
There was a problem hiding this comment.
sounds reasonable to me, but why it only happens for compile, not eager?
|
The kernel that gets launched by I think they are converted in |
tianyu-l
left a comment
There was a problem hiding this comment.
have you measured the perf/memory impact of this change? sounds necessary to me anyway
|
We measured on a Qwen3 MoE model with moderate local batch size (16), performance impact was negligible (<1% TPS). |
Indices get multiplied with strides in operations like aten::index_put. A 32-bit index can lead to a silent overflow, since the Torch Inductor does not do any overflow checking on its own.
This commit fixes GitHub issue #2430