Skip to content

Conversation

@GiggleLiu
Copy link
Member

Summary

  • Add new tropical-gemm-metal crate for GPU-accelerated tropical GEMM on macOS
  • Support MaxPlus, MinPlus, and MaxMul semirings (f32)
  • Include argmax tracking and backward pass kernels for neural network training
  • Follow the same architecture as the CUDA backend for consistency

Implementation Details

The Metal backend provides:

  • MetalContext: Compiles and caches Metal shaders, manages command queue
  • GpuMatrix: GPU memory management with row-major ↔ column-major conversion
  • MetalKernel trait: Type-safe kernel dispatch for each semiring type
  • Global context caching: Avoids shader recompilation overhead

API Example

use tropical_gemm_metal::{tropical_matmul_metal, MetalContext};
use tropical_gemm::types::TropicalMaxPlus;

// One-shot API with cached global context
let c = tropical_matmul_metal::<TropicalMaxPlus<f32>>(&a, m, k, &b, n)?;

// Or with explicit context for repeated operations
let ctx = MetalContext::new()?;
tropical_gemm_metal::<TropicalMaxPlus<f32>>(&ctx, &a_gpu, &b_gpu, &mut c_gpu)?;

Test plan

  • All 6 unit tests pass on Apple Silicon
  • MaxPlus, MinPlus, MaxMul semirings tested
  • Argmax tracking verified for backpropagation
  • Larger matrix (64x128x64) test passes

Closes #1

🤖 Generated with Claude Code

Add a new `tropical-gemm-metal` crate that provides GPU-accelerated tropical
matrix multiplication using Metal on macOS. This addresses issue #1 by enabling
GPU acceleration on Apple Silicon.

Features:
- Support for MaxPlus, MinPlus, and MaxMul semirings (f32)
- Argmax tracking for backpropagation
- Backward pass kernels for gradient computation
- Global context caching for efficient repeated operations
- Row-major to column-major layout conversion

The implementation follows the same architecture as the CUDA backend:
- MetalContext: Shader compilation and kernel management
- GpuMatrix: GPU memory management with layout conversion
- MetalKernel trait: Type-safe kernel dispatch

Closes #1

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature Request: Metal backend for Apple GPU

2 participants