Make tensor of indices a LongTensor by hirschsn · Pull Request #2431 · pytorch/torchtitan

hirschsn · 2026-02-24T12:28:23Z

Indices get multiplied with strides in operations like aten::index_put. A 32-bit index can lead to a silent overflow, since the Torch Inductor does not do any overflow checking on its own.

This commit fixes GitHub issue #2430

Indices get multiplied with strides in operations like aten::index_put. A 32-bit index can lead to a silent overflow, since the Torch Inductor does not do any overflow checking on its own. This commit fixes GitHub issue pytorch#2430

tianyu-l

sounds reasonable to me, but why it only happens for compile, not eager?

hirschsn · 2026-02-25T07:50:06Z

The kernel that gets launched by index_put in eager mode is (according to a Torch profile) void at::native::index_elementwise_kernel</* ... */, at::native::gpu_index_kernel<at::native::index_put_kernel_impl //... .
The kernel that does the indexing is gpu_index_kernel (aten/src/ATen/native/cuda/IndexKernel.cu:101ff) and it receives the indices as 64-bit integers.

I think they are converted in index_put_with_sort_kernel (which seems to be the dispatch target for CUDA), there the indices go through makeLinearIndex (aten/src/ATen/native/cuda/Indexing.cu:615), which in turn does:

  for (auto & i : indices) {
    if (i.defined() && i.dtype() == at::kInt) {
      i = i.to(at::kLong);
    }
  }

tianyu-l

have you measured the perf/memory impact of this change? sounds necessary to me anyway

hirschsn · 2026-02-25T14:26:16Z

We measured on a Qwen3 MoE model with moderate local batch size (16), performance impact was negligible (<1% TPS).

Make tensor of indices a LongTensor

2042f92

Indices get multiplied with strides in operations like aten::index_put. A 32-bit index can lead to a silent overflow, since the Torch Inductor does not do any overflow checking on its own. This commit fixes GitHub issue pytorch#2430

hirschsn requested review from fegin, tianyu-l, wconstab and wwwjn as code owners February 24, 2026 12:28

meta-cla bot added the CLA Signed This label is managed by the Meta Open Source bot. label Feb 24, 2026

Fix index tensor dtype in test

82bcb2d

tianyu-l reviewed Feb 24, 2026

View reviewed changes

tianyu-l requested a review from xmfan February 24, 2026 19:06

tianyu-l approved these changes Feb 25, 2026

View reviewed changes

Fix one more int64 in fill_indices_cpu

65dbedd

tianyu-l merged commit ac359c0 into pytorch:main Feb 25, 2026
9 of 11 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make tensor of indices a LongTensor#2431

Make tensor of indices a LongTensor#2431
tianyu-l merged 3 commits intopytorch:mainfrom
hirschsn:fix-moe-integer-overflow

hirschsn commented Feb 24, 2026

Uh oh!

tianyu-l left a comment

Uh oh!

hirschsn commented Feb 25, 2026 •

edited

Loading

Uh oh!

tianyu-l left a comment

Uh oh!

hirschsn commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hirschsn commented Feb 24, 2026

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

hirschsn commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tianyu-l left a comment

Choose a reason for hiding this comment

Uh oh!

hirschsn commented Feb 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

hirschsn commented Feb 25, 2026 •

edited

Loading