Skip to content

Conversation

@krishnasai-mcw
Copy link
Contributor

@krishnasai-mcw krishnasai-mcw commented Dec 12, 2025

Description

This PR introduces optimized layer normalization primitive for RV64 architectures using RVV (RISC-V Vector) intrinsics.

Features and Limitations

Supported:

  • Data type: f32
  • Input: plain and dense layouts, with the last dimension contiguous
  • Stats modes:
    • Global stats (mean and variance provided externally)
    • Calculate stats (kernel computes mean and variance internally)
  • Scale & shift
  • Forward propagation - FWD_I & FWD_D

Not supported:

  • Post-ops
  • RMS layer normalization
  • Backward propagation

Implementation Details

The implementation uses an LMUL=1 configuration with 4-way loop unrolling to maximize throughput while maintaining register pressure. For statistics calculation, the kernel uses double-precision (f64) accumulators via widening operations to maintain numerical accuracy across large reduction operations. The normalization pass applies the standard formula ((x - mean) / std) × scale + shift using fused multiply-accumulate instructions to efficiently compute the final output.

Checklist

General

  • Do all unit and benchdnn tests (make test and make test_benchdnn_*) pass locally for each commit?
  • Have you formatted the code using clang-format?

BenchDNN test log: test_lnorm_all.log

Performance improvements

  • Have you submitted performance data that demonstrates performance improvements?

Performance Results

Test Platform: Banana Pi F3

Test Case Vector Scalar Speedup
option_set_all 86.46 1055.51 12.21
option_set_fwks_ext_gpu 62.36 765.01 12.27
shapes_ci 2.67 29.36 11.00

@krishnasai-mcw krishnasai-mcw requested review from a team as code owners December 12, 2025 11:20
@krishnasai-mcw krishnasai-mcw force-pushed the rv64-layernorm branch 2 times, most recently from 2645bb4 to e0fcf42 Compare December 12, 2025 14:26
@krishnasai-mcw krishnasai-mcw marked this pull request as draft December 12, 2025 15:01
@krishnasai-mcw krishnasai-mcw marked this pull request as ready for review December 12, 2025 18:45
@zhangjian29
Copy link
Contributor

Hi @krishnasai-mcw ,

I noticed that you're using f64 data type. Do you think it is necessary to add a check to ensure the platform supports f64?

@krishnasai-mcw
Copy link
Contributor Author

Hi @krishnasai-mcw ,

I noticed that you're using f64 data type. Do you think it is necessary to add a check to ensure the platform supports f64?

Hi @zhangjian29 ,
No extra check for f64 is needed. The RISC-V G extension itself includes the D extension, which provides full double-precision (f64) support. In our CMake setup, we already target G by default, ensuring f64 is supported by default.

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants