Skip to content

Conversation

@ingyukoh
Copy link

@ingyukoh ingyukoh commented Feb 2, 2026

Replace std::thread::hardware_concurrency() with cpu_get_num_math() when --threads is set to -1 or 0 (auto-detect mode).

hardware_concurrency() returns logical cores (includes hyperthreads), causing thread oversubscription and performance degradation:

  • 100% CPU usage instead of optimal ~50%
  • 3.6x slower (2.5 tok/s vs 9 tok/s reported)

cpu_get_num_math() returns physical cores and also handles Intel hybrid CPUs by skipping efficiency cores for math workloads.

Fixes #19110

Make sure to read the contributing guidelines before submitting a PR

Replace std::thread::hardware_concurrency() with cpu_get_num_math()
when --threads is set to -1 or 0 (auto-detect mode).

hardware_concurrency() returns logical cores (includes hyperthreads),
causing thread oversubscription and performance degradation:
- 100% CPU usage instead of optimal ~50%
- 3.6x slower (2.5 tok/s vs 9 tok/s reported)

cpu_get_num_math() returns physical cores and also handles Intel
hybrid CPUs by skipping efficiency cores for math workloads.

Fixes ggml-org#19110
@ingyukoh
Copy link
Author

ingyukoh commented Feb 4, 2026

CI Status:

  • RISC-V jobs: Failing at ~30s (infrastructure timeout, unrelated to this PR)
  • Vulkan job: Failing on GPU path (this PR only touches CPU thread detection)
  • CPU jobs: ✅ 73 passing

This PR changes 4 lines in CPU thread detection logic. The failing jobs are unrelated to the code path modified.
Happy to re-run flaky jobs if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

--threads -1 double counts via std::thread::hardware_concurrency() due to hyper-threading

1 participant