Skip to content

Conversation

@yeahdongcn
Copy link
Collaborator

@yeahdongcn yeahdongcn commented Jan 28, 2026

Summary

This PR adds automatic patching of ctypes.CDLL to transparently translate CUDA/NCCL function names to their MUSA/MCCL equivalents when loading MUSA libraries. This enables projects like sglang that use ctypes to directly call CUDA runtime functions to work on MUSA without any code changes.

Motivation

Projects like sglang use ctypes.CDLL to load CUDA libraries and call functions like cudaMalloc, cudaIpcOpenMemHandle, ncclAllReduce, etc. On MUSA, these libraries have different names (libmusart.so instead of libcudart.so) and function names (musaMalloc instead of cudaMalloc).

Previously, projects needed to add explicit name translation logic (see sglang PR #17499). With this change, they just need to import torchada and their existing code works unchanged.

Changes

Core Implementation (src/torchada/_patch.py):

  • Added _CDLLWrapper class that wraps ctypes.CDLL instances for MUSA libraries
  • Intercepts __getattr__ to automatically translate function names:
    • cudaXxxmusaXxx (for libmusart.so)
    • ncclXxxmcclXxx (for libmccl.so)
    • cublasXxxmublasXxx (for libmublas.so)
    • curandXxxmurandXxx (for libmurand.so)
  • Added _patch_ctypes_cdll() patch function that replaces ctypes.CDLL with a version that returns wrapped instances for MUSA libraries

Runtime Utilities (src/torchada/_runtime.py):

  • Added utility functions for manual name conversion (exported but not required for typical use):
    • cuda_to_musa_name()
    • nccl_to_mccl_name()
    • cublas_to_mublas_name()
    • curand_to_murand_name()

Tests (tests/test_mappings.py):

  • Added TestCDLLWrapper class with real library tests (not mocks):
    • test_libmusart_cuda_to_musa_translation - Loads actual libmusart.so, verifies cudaMalloc/cudaFree translation
    • test_libmccl_nccl_to_mccl_translation - Loads actual libmccl.so, verifies ncclAllReduce translation
    • test_libmublas_cublas_to_mublas_translation - Loads actual libmublas.so
    • test_libmurand_curand_to_murand_translation - Loads actual libmurand.so
    • test_non_musa_lib_no_translation - Verifies non-MUSA libraries aren't wrapped
    • test_sglang_cuda_wrapper_pattern - End-to-end test simulating sglang's exact usage pattern, actually allocates/frees GPU memory
  • Added TestRuntimeNameConversion class for utility function tests

Documentation:

  • Updated README.md and README_CN.md with:
    • New entry in "Supported Features" table
    • New "ctypes Library Loading" example section
    • Updated API Reference with runtime utility functions

Usage

import torchada  # Apply patches automatically
import ctypes

# Load MUSA runtime library - use CUDA function names!
lib = ctypes.CDLL("libmusart.so")
func = lib.cudaMalloc  # Automatically translates to musaMalloc

# Works with MCCL too
nccl_lib = ctypes.CDLL("libmccl.so")
func = nccl_lib.ncclAllReduce  # Automatically translates to mcclAllReduce

Notes

  • Non-MUSA libraries (e.g., libc.so) are not wrapped and work normally.
  • The patch only applies on MUSA platforms (is_musa_platform() == True).

Testing

All tests pass in both torch_musa 2.7.0 and 2.7.1 containers:

251 passed, 2 skipped

Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
Signed-off-by: Xiaodong Ye <xiaodong.ye@mthreads.com>
@yeahdongcn yeahdongcn merged commit aa4e851 into main Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants