GEMM and Attention experiment Status Very early stage. just works. not optimized. Archtectures a64fx SVE intrinsics SVE assembly Ryzen Vulkan Compute Cooperative matrix(Tested on RDNA4) Maybe CUDA