Fix Bug: Port over optimizations and fixes #223

GillisC · 2025-05-15T15:56:58Z

Description

Corrections to memory alignment, single abstraction for making aligned memory allocations.
Bug fix to Blocked GEMM
Ported over corrected AVX2 and AVX512
Add memory alignment allocation in Base64.hpp (there is probably a better way to do this.)
Small fix to transpose which led to better performance when dealing with explicitly 2D tensors.

Issue

Closes #219
Closes #222

Copilot

Pull Request Overview

This PR applies bug fixes and optimizations across multiple GEMM implementations while standardizing memory alignment.

Refactored memory alignment allocation and added runtime alignment checks.
Improved tensor gemm implementations for blocked, AVX2, and AVX512 including shape validations and error message adjustments.
Updated CMake configurations and utility modules to support the new alignment and optimized settings.

Reviewed Changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 3 comments.

Show a summary per file

File	Description
src/openblas/CMakeLists.txt	Removed hard-coded memory alignment setting to rely on global config.
src/default/datastructures/tensor_gemm.cpp	Minor refactoring with removal of extraneous blank line.
src/default/datastructures/tensor.cpp	Added new get_raw_data() method.
src/default/datastructures/mml_array.cpp	Integrated aligned memory allocation and added alignment validation.
src/blocked/datastructures/tensor_gemm.cpp	Revised loop structure for better performance in blocked GEMM.
src/blocked/CMakeLists.txt	Removed legacy memory alignment definition.
src/avx512/datastructures/tensor_gemm.cpp	Enhanced shape validation and updated AVX512 gemm with corrected error messages.
src/avx512/CMakeLists.txt	Updated compile options using target_compile_options for AVX512.
src/avx/datastructures/tensor_gemm.cpp	Improved AVX2 gemm with shape checks and aligned memory usage; updated error messaging.
src/avx/CMakeLists.txt	Updated compile options for AVX2 with additional flags.
include/utility/base64.hpp	Modified memory allocation to support aligned tensor data.
include/utility/avx_mask_helper.hpp	Added helper functions for mask creation in AVX2/AVX512 GEMM.
include/utility/aligned_alloc.hpp	Introduced aligned memory allocation utility.
include/datastructures/tensor.hpp	Documented and declared the new get_raw_data() method.
include/datastructures/mml_array.hpp	Updated include dependencies for aligned allocation.
CMakeLists.txt	Revised global tensor alignment configuration with new options; potential duplicate condition identified.

src/avx512/datastructures/tensor_gemm.cpp

include/utility/avx_mask_helper.hpp

CMakeLists.txt

willayy

A lot of code and im not really up to speed XD, but if it works its good :D

GillisC · 2025-05-15T22:41:39Z

@ehmc123 Quick question, if you look at avx_mask_helper that is currently sitting in the default folder, would i just be able to split it up and place the helper functions wherever in the avx and avx512 folders or is there some additional configuration that is needed in cmake?

GillisC added 5 commits May 15, 2025 16:25

Set memory alignment macro correctly

818ce10

Add memory alignment utility which correctly aligns

0112bdd

Fix blocked gemm bug

e561fa6

Add correct avx2 implementation

c1415a3

Add correct avx512 implementation

4f3710b

GillisC requested review from Copilot, ehmc123 and willayy May 15, 2025 15:56

GillisC self-assigned this May 15, 2025

GillisC added bug Something isn't working enhancement New feature or request labels May 15, 2025

GillisC linked an issue May 15, 2025 that may be closed by this pull request

Feature request: Port Optimizations #222

Closed

Copilot AI reviewed May 15, 2025

View reviewed changes

src/avx512/datastructures/tensor_gemm.cpp Outdated Show resolved Hide resolved

include/utility/avx_mask_helper.hpp Outdated Show resolved Hide resolved

CMakeLists.txt Outdated Show resolved Hide resolved

GillisC added 4 commits May 15, 2025 17:59

Add blocking transpose

129c605

Remove the use of 'using namespace'

90465d3

Fix typo in Cmake file

3849ae4

Fix typo in avx512 gemm

ddc26d4

willayy approved these changes May 15, 2025

View reviewed changes

GillisC merged commit 7f0812e into main Jul 2, 2025
3 of 4 checks passed

GillisC deleted the 222-feature-request-port-optimizations branch July 2, 2025 13:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix Bug: Port over optimizations and fixes #223

Fix Bug: Port over optimizations and fixes #223

Uh oh!

GillisC commented May 15, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willayy left a comment

Uh oh!

GillisC commented May 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Fix Bug: Port over optimizations and fixes #223

Fix Bug: Port over optimizations and fixes #223

Uh oh!

Conversation

GillisC commented May 15, 2025

Description

Issue

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

willayy left a comment

Choose a reason for hiding this comment

Uh oh!

GillisC commented May 15, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants