More tuning improvements #2201

umangyadav · 2026-01-08T02:11:11Z

Motivation

Creates dialect registry only once.
Uses the same threadPool for all the threads.
Renames MLIR_ENABLE_THREADS to LLVM_ENALBE_THREADS. There is no compile time flag as MLIR_ENALBE_THREADS. rocm/llvm has it incorrectly. llvm/llvm uses LLVM_ENABLE_THREADS
Only copy GPU buffers once and use same buffer for benchmarking for perf configs.

Testing

rocmlir-gen -m 1500 -n 1152 -k 896 --operation gemm --arch gfx942 --num_cu 80

Tuning Mode	develop	tuningImprovements	Improvement
Quick	1.50s	1.53s	-2% (slower)
Full	19.5s	19.5s	~0% (same)
Exhaustive	4m 2.6s	3m 59.5s	~1.3% faster

mirza-halilcevic · 2026-01-08T02:24:11Z

mlir/tools/rocmlir-tuning-driver/rocmlir-tuning-driver.cpp

+    // Create context with threading disabled internally, attach shared pool
+    ctx = std::make_unique<MLIRContext>(registry,
+                                        MLIRContext::Threading::DISABLED);
+    ctx->setThreadPool(getSharedThreadPool());


Does this mean that the compilation is parallelized internally as well? We could be oversubscribing threads because we are already parallelizing at a higher level. Can we control the number of threads in the pool?

yes it looks like it would use threading internally as well.
But not sure how exactly it works.
I copied logic from MIGraphX's compilation
https://github.com/ROCm/AMDMIGraphX/blob/4d968f79f02de4de5aa3c36f12a179183c12c04e/src/targets/gpu/mlir.cpp#L286

Threading is enabled by default internally inside MLIRContext unless disabled explictly.

rocMLIR/external/llvm-project/mlir/lib/IR/MLIRContext.cpp

Line 274 in 4644c19

MLIRContextImpl(bool threadingIsEnabled)

This change of passing threadpool explictly is meant to reduce overheads of creating seperate threadpool across all parallel threads. But it looks like it is not really affecting runtime. But it is a good practice.

This is a good optimization.

In that case we are for sure oversubscribing. I think we can optimize further by manipulating the thread count. Maybe leave 50% of the compile threads for the compilation workers, and the other 50% for the thread pool. There's probably a way to tell the thread pool how many threads to use. 50% is arbitrary here, maybe a different distribution works better. That could be the reason why you don't see it affect runtime, the cpu is already oversaturated with threads.

dhernandez0 · 2026-01-08T09:15:31Z

external/llvm-project/mlir/lib/IR/MLIRContext.cpp


 static bool isThreadingGloballyDisabled() {
-#if MLIR_ENABLE_THREADS != 0
+#if LLVM_ENABLE_THREADS != 0


can you create an upstream PR to fix this?

bug is in rocm/llvm. I'll post a PR there.

dhernandez0 · 2026-01-08T09:17:03Z

mlir/tools/rocmlir-tuning-driver/rocmlir-tuning-driver.cpp


+  // Copy host buffers to GPU once (reused across all config benchmarks)
+  for (size_t i = 0; i < bufferLengths.size(); i++) {
+    HIPCHECK(hipMemcpy(gpuBuffers[i], hostBuffers[i], bufferLengths[i],


why change from hipMemcpyAsync to hipMemcpy?

We should be careful with this. I believe that host-to-device copies are still async unless the host memory is allocated page-locked (allocated with hipHostMalloc). It just stages it for DMA transfer and does not wait for the copy to finish.

CUDA behaves like this, I would suppose that HIP does as well: https://docs.nvidia.com/cuda/cuda-driver-api/api-sync-behavior.html

We probably should be using hipHostMalloc anyway to speed up the memory transfer.

It was changed from hipMemCpyAsync to hipMemcpy becuase it doesn't require "stream" or because it uses default stream.

ah I see, at this point you don't have a stream?

dhernandez0 · 2026-01-08T09:21:28Z

mlir/tools/rocmlir-tuning-driver/rocmlir-tuning-driver.cpp

    gpuBuffers.push_back(gpuBuffer);
  }

+  // Copy host buffers to GPU once (reused across all config benchmarks)


what if we are using atomics? the results would be different in every iteration because we don't init the output tensor with the same values. I think that shouldn't affect run-time but asking just in case.

we don't care about results here. Just the benchmarking

can we add a comment just in case? so the reader is aware of this.

dhernandez0 · 2026-01-08T09:22:46Z

external/llvm-project/mlir/lib/IR/MLIRContext.cpp

can you have this change in a seperate commit with "[external]..."?

umangyadav added 2 commits January 8, 2026 01:51

add some more improvements

452b493

formatting

b586e68

umangyadav requested a review from causten as a code owner January 8, 2026 02:11

umangyadav self-assigned this Jan 8, 2026

umangyadav requested review from dhernandez0, justinrosner, mirza-halilcevic and pabloantoniom January 8, 2026 02:11

mirza-halilcevic reviewed Jan 8, 2026

View reviewed changes

dhernandez0 reviewed Jan 8, 2026

View reviewed changes

external/llvm-project/mlir/lib/IR/MLIRContext.cpp

Copy link

Contributor

dhernandez0 Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you have this change in a seperate commit with "[external]..."?

dhernandez0 approved these changes Jan 8, 2026

View reviewed changes

justinrosner approved these changes Jan 8, 2026

View reviewed changes

Merge branch 'develop' into tuningImprovements

aa86174

More tuning improvements #2201

Are you sure you want to change the base?

More tuning improvements #2201

Conversation

umangyadav commented Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Testing

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mirza-halilcevic Jan 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

umangyadav commented Jan 8, 2026 •

edited

Loading

mirza-halilcevic Jan 8, 2026 •

edited

Loading