Skip to content

Conversation

@hmyuuu
Copy link

@hmyuuu hmyuuu commented Jan 7, 2026

Closes #1

Summary

  • Add tropical-gemm-metal crate for GPU acceleration on macOS
  • Support MaxPlus, MinPlus, MaxMul semirings (f32)
  • Tiled algorithm with shared memory (32x32 blocks)

Benchmark (Apple M1 Pro)

GPU vs CPU Speedup

Size CPU (ms) Metal (ms) Speedup
256 15.9 2.2 7x
512 110.8 7.0 16x
1024 1061.0 21.7 49x
2048 26911.0 103.9 259x

All Semirings (Metal Kernel Time)

Size MaxPlus MinPlus MaxMul
256 2.5ms 2.3ms 2.0ms
512 4.9ms 4.9ms 4.7ms
1024 18.3ms 19.3ms 17.9ms
2048 106.1ms 106.2ms 113.5ms

hmyuuu added 2 commits January 8, 2026 07:22
- Add tropical-gemm-metal crate with Metal shaders
- Support MaxPlus, MinPlus, MaxMul semirings (f32)
- Tiled algorithm with 32x32 blocks and shared memory
- API matches CUDA backend interface
- Benchmarks show 259x speedup over CPU at 2048x2048

Tested on Apple M1 Pro.
@GiggleLiu
Copy link
Member

Thank you for the PR. The CI breaks.

@GiggleLiu
Copy link
Member

Also, please notice that this package is not registered yet, and it undergoing refactoring.

@hmyuuu
Copy link
Author

hmyuuu commented Jan 8, 2026

CI fixed.

and it undergoing refactoring.

Ok, we can leave this pr on hold until the refactoring is finished

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Metal backend for Apple GPU

2 participants