Skip to content

enable torchao int4 on cpu side#69

Open
airMeng wants to merge 2 commits intomingfeima:cpu_opt_ww11from
airMeng:ao_int
Open

enable torchao int4 on cpu side#69
airMeng wants to merge 2 commits intomingfeima:cpu_opt_ww11from
airMeng:ao_int

Conversation

@airMeng
Copy link

@airMeng airMeng commented Apr 23, 2025

Motivation

Modifications

Checklist

mingfeima and others added 2 commits April 22, 2025 23:29
1. brgemm impl: move brgemm out of inner loop
2. avx512 impl: move scaling out of inner loop
3. fp8_scaled_mm: change BLOCK_M to 128 to reduce access to B
4. cvt_fp8_bf16: ignore NaN handling

```
Comparing:  True  max_diff = 0.01562, asum = 10.562, bsum = 10.375

gemm_bf16(native): 89.812 us, gemm_fp8(opt): 124.585 us

Comparing:  True  max_diff = 0.01562, asum = -32.500, bsum = -32.750

gemm_bf16(native): 83.805 us, gemm_fp8(opt): 125.586 us

Comparing:  True  max_diff = 0.01562, asum = -35.750, bsum = -36.500

gemm_bf16(native): 89.579 us, gemm_fp8(opt): 151.284 us

Comparing:  True  max_diff = 0.03125, asum = 4512.000, bsum = 4512.000

gemm_bf16(native): 262.104 us, gemm_fp8(opt): 615.823 us

```

```
Comparing:  True  max_diff = 0.01562, asum = 10.562, bsum = 10.375

gemm_bf16(native): 86.403 us, gemm_fp8(opt): 95.792 us

Comparing:  True  max_diff = 0.01562, asum = -32.500, bsum = -32.750

gemm_bf16(native): 84.178 us, gemm_fp8(opt): 100.573 us

Comparing:  True  max_diff = 0.01562, asum = -35.750, bsum = -36.500

gemm_bf16(native): 90.365 us, gemm_fp8(opt): 114.198 us

Comparing:  True  max_diff = 0.03125, asum = 4512.000, bsum = 4512.000

gemm_bf16(native): 267.053 us, gemm_fp8(opt): 404.231 us
```
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants