MLP with MPI through MLIR by fschlimb · Pull Request #50 · llvm/lighthouse

fschlimb · 2026-02-04T17:31:50Z

This demonstrates the distributed sharding infrastucture of MLIR unsig a single MLP.
Lowering sharding annotations to MPI.
Currently using the lower end of the pipeline, sharding-propagation is not fit for it yet.
2 different sharding policies can be used by selecting a 1d or a 2d device grid.

Note: some fixes to make this work have not yet landed on MLIR main.

@rolfmorel @tkarna @rengolin Is there any CI? If so, what is the recommended way of integrating this?

rolfmorel · 2026-02-06T13:03:21Z

For CI, I think you will want:

Add an optional Python dependency group for mpi with mpi4py and probably mpich in it (though maybe there should be multiple incompatible dependency groups, like we have for torch, to also target running with openmpi): e.g. like

lighthouse/pyproject.toml

Lines 18 to 22 in d32d1cf

    
           [project.optional-dependencies] 
        
           ingress_torch_mlir = [ 
        
               "torch-mlir==20260125.703", 
        
               "ml_dtypes", 
        
           ]

Add a mpi/mpi-mpich "feature" to the llvm-lit config, like:

lighthouse/lit.cfg.py

Lines 23 to 24 in d32d1cf

    
           if importlib.util.find_spec("torch"): 
        
               config.available_features.add("torch")

Add a REQUIRES: mpi (or REQUIRES: mpi-mpich) line to your python file that is to be CHECKed. E.g., like

lighthouse/examples/llama/test_llama3.py

Lines 1 to 2 in d32d1cf

    
           # RUN: %PYTHON %s 
        
           # REQUIRES: torch

If the CI machines needs to have certain (system) libraries, you will want to modify the following file:

lighthouse/.github/workflows/examples.yml

Line 29 in d32d1cf

sudo apt-get install -y llvm-dev # Obtain FileCheck, used in testing.

Note that if Ubuntu packages exist, this is easy. If not, this becomes quite a bit more involved.

fschlimb · 2026-02-11T16:00:32Z

@rolfmorel @tkarna @rengolin this is ready for review

Once llvm/llvm-project#180962 has been merged, we should update the mlir dep and also run the 2d-grid case in CI.

Copilot

Pull request overview

This PR integrates MPI-based distributed computing capabilities into the MLIR infrastructure, enabling multi-rank execution of neural network operations. It adds support for running a Multi-Layer Perceptron (MLP) across multiple MPI processes with different sharding strategies.

Changes:

Updated MLIR and torch-mlir dependencies to newer versions
Added MPI runtime dependencies (mpi4py, mpich) as optional extras
Implemented a distributed MLP example with weight-stationary sharding strategies
Extended lit test infrastructure to detect and run MPI-enabled tests

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 4 comments.

Show a summary per file

File	Description
pyproject.toml	Updated dependency versions and added runtime_mpi optional dependency group
lit.cfg.py	Extended test detection to support MPI packages and added VIRTUAL_ENV substitution
lighthouse/workload/runner.py	Modified shared library path resolution to support absolute paths
examples/mlp-mpi/mlp_weight_stationary.mlir	New MLIR template for distributed MLP with sharding annotations
examples/mlp-mpi/mlp-mpi.py	New Python implementation of distributed MLP workload with MPI execution
examples/mlp-mpi/README.md	Documentation for running the MPI-based MLP example
.github/workflows/examples.yml	Added CI workflow step for MPI-enabled examples

Comments suppressed due to low confidence (1)

examples/mlp-mpi/mlp-mpi.py:1

The hardcoded library name 'libmpi.so.12' is brittle and will fail on systems with different MPI versions or implementations (e.g., libmpi.so.40 for newer MPICH, or different naming for OpenMPI). Consider using a more flexible approach such as detecting the library at runtime or using a configurable variable.

# REQUIRES: mpi4py

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

examples/mlp-mpi/mlp-mpi.py

Copilot · 2026-02-11T16:10:21Z

examples/mlp-mpi/mlp-mpi.py

+            self.M * self.N * self.K * 2 + self.M * self.K * 4
+        )  # matmuls + sigmoid


The FLOP count calculation is incorrect. The MLP performs two matmuls: A@B (M×K by K×N = 2MNK FLOPs) and result@C (M×N by N×K = 2MNK FLOPs), totaling 4MNK FLOPs. The sigmoid operation on M×N elements requires approximately 5MN FLOPs (not 4MK). The formula should be '4 * self.M * self.N * self.K + 5 * self.M * self.N'.

Suggested change

self.M * self.N * self.K * 2 + self.M * self.K * 4

) # matmuls + sigmoid

4 * self.M * self.N * self.K + 5 * self.M * self.N

) # 2 matmuls (4MNK) + sigmoid (≈5MN)

examples/mlp-mpi/README.md

rolfmorel

Looks fine to me.

The CI changes look good. Maybe @tkarna can check the workload usage and a bit more detailed look at the shard & mpi specifics.

rolfmorel · 2026-02-16T14:03:37Z

examples/mlp-mpi/mlp-mpi.py

+        # rprint(" Execute 2 ".center(60, "-"))
+        # execute(wload, verbose=1)
+
+        # rprint(" Benchmark ".center(60, "-"))
+        # times = benchmark(wload)
+        # times *= 1e6  # convert to microseconds
+        # compute statistics
+        # mean = np.mean(times)
+        # min = np.min(times)
+        # max = np.max(times)
+        # std = np.std(times)
+        # rprint(f"Timings (us): mean={mean:.2f}+/-{std:.2f} min={min:.2f} max={max:.2f}")
+        # flop_count = wload.get_complexity()[0]
+        # gflops = flop_count / (mean * 1e-6) / 1e9
+        # rprint(f"Throughput: {gflops:.2f} GFLOPS")


Suggested change

# rprint(" Execute 2 ".center(60, "-"))

# execute(wload, verbose=1)

# rprint(" Benchmark ".center(60, "-"))

# times = benchmark(wload)

# times *= 1e6 # convert to microseconds

# compute statistics

# mean = np.mean(times)

# min = np.min(times)

# max = np.max(times)

# std = np.std(times)

# rprint(f"Timings (us): mean={mean:.2f}+/-{std:.2f} min={min:.2f} max={max:.2f}")

# flop_count = wload.get_complexity()[0]

# gflops = flop_count / (mean * 1e-6) / 1e9

# rprint(f"Throughput: {gflops:.2f} GFLOPS")

rolfmorel · 2026-02-16T14:06:47Z

examples/mlp-mpi/mlp-mpi.py

+                    # test_analysis_only=True,
+                    # print_conflicts=True,


Suggested change

# test_analysis_only=True,

# print_conflicts=True,

examples/mlp-mpi/mlp-mpi.py

rolfmorel · 2026-02-16T16:50:41Z

examples/mlp-mpi/mlp-mpi.py

+                    "split_mm1_c": "[[], [0]]",
+                }
+            )
+        txt = txt.format_map(format_values)


Any reason to prefer this textual manipulation over generating the payload with Python and having the payload-generating function be suitably parameterized?

I am fed up with using the python mlir builders. It is more work than writing plain MLIR.

rolfmorel · 2026-02-16T16:52:13Z

examples/mlp-mpi/mlp-mpi.py

+                    # test_analysis_only=True,
+                    # print_conflicts=True,


Suggested change

# test_analysis_only=True,

# print_conflicts=True,

rolfmorel · 2026-02-16T16:52:34Z

examples/mlp-mpi/mlp-mpi.py

+        # rprint(" Execute 2 ".center(60, "-"))
+        # execute(wload, verbose=1)
+
+        # rprint(" Benchmark ".center(60, "-"))
+        # times = benchmark(wload)
+        # times *= 1e6  # convert to microseconds
+        # compute statistics
+        # mean = np.mean(times)
+        # min = np.min(times)
+        # max = np.max(times)
+        # std = np.std(times)
+        # rprint(f"Timings (us): mean={mean:.2f}+/-{std:.2f} min={min:.2f} max={max:.2f}")
+        # flop_count = wload.get_complexity()[0]
+        # gflops = flop_count / (mean * 1e-6) / 1e9
+        # rprint(f"Throughput: {gflops:.2f} GFLOPS")


Suggested change

# rprint(" Execute 2 ".center(60, "-"))

# execute(wload, verbose=1)

# rprint(" Benchmark ".center(60, "-"))

# times = benchmark(wload)

# times *= 1e6 # convert to microseconds

# compute statistics

# mean = np.mean(times)

# min = np.min(times)

# max = np.max(times)

# std = np.std(times)

# rprint(f"Timings (us): mean={mean:.2f}+/-{std:.2f} min={min:.2f} max={max:.2f}")

# flop_count = wload.get_complexity()[0]

# gflops = flop_count / (mean * 1e-6) / 1e9

# rprint(f"Throughput: {gflops:.2f} GFLOPS")

examples/mlp-mpi/mlp-mpi.py

examples/mlp-mpi/mlp_weight_stationary.mlir

examples/mlp-mpi/mlp-mpi.py

examples/mlp-mpi/README.md

examples/mlp-mpi/mlp_weight_stationary.mlir

examples/mlp-mpi/mlp-mpi.py

lighthouse/utils/memref.py

lit.cfg.py

pyproject.toml

examples/mlp-mpi/mlp-mpi.py

tkarna

Thanks for the updates. This is a neat example!

fschlimb mentioned this pull request Feb 11, 2026

[mlir][shard] Empowering resharding llvm/llvm-project#180962

Merged

fschlimb requested review from Copilot, rengolin, rolfmorel and tkarna February 11, 2026 16:09

Copilot AI reviewed Feb 11, 2026

View reviewed changes

rolfmorel reviewed Feb 16, 2026

View reviewed changes

tkarna reviewed Feb 18, 2026

View reviewed changes

fschlimb added 9 commits February 19, 2026 07:36

example: A single MLP that can run on multiple MPI ranks.

962bd74

adding readme

ba94f36

more robust correctness check, cleanup

921a92d

ruff

5d30465

enabling CI for MPI tests, cleanup

43557a8

updating mlir; providing path to mpi

dc12886

impi -> mpich

80540fb

adressing review comments

32d4e1b

accepting grid-shape instead of grid dimensionality

b2775ac

fschlimb force-pushed the mlp-mpi branch from 2bf9a05 to b2775ac Compare February 19, 2026 15:39

merge flaw

b45198a

tkarna reviewed Feb 20, 2026

View reviewed changes

examples/mlp-mpi/mlp_weight_stationary.mlir Show resolved Hide resolved

tkarna reviewed Feb 20, 2026

View reviewed changes

examples/mlp-mpi/mlp-mpi.py Outdated Show resolved Hide resolved

fschlimb added 3 commits February 20, 2026 02:51

removing cmdln arg --utils-dir

02221fe

adding deallocation logic

16f6da2

bumping mlir version and enabling 2d grid tests

2afe6f8

tkarna reviewed Feb 20, 2026

View reviewed changes

lighthouse/utils/memref.py Outdated Show resolved Hide resolved

lit.cfg.py Outdated Show resolved Hide resolved

pyproject.toml Outdated Show resolved Hide resolved

examples/mlp-mpi/mlp-mpi.py Outdated Show resolved Hide resolved

fschlimb added 2 commits February 23, 2026 03:57

adressing review comments

6aed1c3

forward exception if dealloc fails

11dda38

tkarna approved these changes Feb 23, 2026

View reviewed changes

fschlimb merged commit 3336c60 into llvm:main Feb 23, 2026
3 checks passed

		self.M * self.N * self.K * 2 + self.M * self.K * 4
		) # matmuls + sigmoid

Comments

Conversation

fschlimb commented Feb 4, 2026

Uh oh!

rolfmorel commented Feb 6, 2026

Uh oh!

fschlimb commented Feb 11, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Copilot AI Feb 11, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

rolfmorel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rolfmorel Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

rolfmorel Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

rolfmorel Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

fschlimb Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

rolfmorel Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

rolfmorel Feb 16, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tkarna left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rolfmorel left a comment •

edited

Loading