Skip to content

Conversation

@orlmon01
Copy link
Contributor

  • Modification to the CPU EP to specify channels_last when data format is NWHC
  • Added a FusedNhwcConv kernel
  • Implementation of the kernel in mlas
  • Added compiler guards so it is inly used with KleidiAi (for now, can be removed if needed)
  • Added unittests

Description

Currently OnnxRT supports NCHW as a default datalayout. For optimisations and kernels that operate better in NHWC layout, or where the datalayout is NHWC in the first place Transposes are added around the layers. This patch seeks to eliminate them in cases of convolutions where it would cause a performance decrease.

Motivation and Context

KleidiAi specific implementation of this feature. Only supports convolutions, DepthWise to follow. Currently a little strict with the filters as a result.

…transposes

* Modification to the CPU EP to specify channels_last when data format is NWHC
* Added a FusedNhwcConv kernel
* Implementation of the kernel in mlas
* Added compiler guards so it is inly used with KleidiAi (for now, can be removed if needed)
* Added unittests

Signed-off-by: Orlaith Monahan <orlaith.monahan@arm.com>
@orlmon01
Copy link
Contributor Author

@microsoft-github-policy-service agree company="Arm"

@orlmon01 orlmon01 marked this pull request as draft December 19, 2025 12:34
@orlmon01 orlmon01 marked this pull request as ready for review December 19, 2025 12:35
@orlmon01
Copy link
Contributor Author

Feedback appreciated as this PR makes quite a lot of changes to the codebase well outside of the normal KleidiAI scope.

Signed-off-by: Orlaith Monahan <orlaith.monahan@arm.com>
@Rohanjames1997
Copy link
Contributor

Hi @orlmon01, I imagine that avoiding transposes also improves performance.
Do you have any performance results to share?
TIA!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants