Skip to content

[FEATURE SUPPORT] add triton geru kernel#57

Merged
LoserCheems merged 3 commits intomainfrom
add-geru-triton-kernel
Dec 10, 2025
Merged

[FEATURE SUPPORT] add triton geru kernel#57
LoserCheems merged 3 commits intomainfrom
add-geru-triton-kernel

Conversation

@LoserCheems
Copy link
Collaborator

Summary

  • Introduces a Triton kernel for in-place outer-product updates on matrices, enhancing performance for alpha-scaled vector outer products.

Root Cause

  • The previous implementation lacked an efficient method for updating strided tiles, leading to performance bottlenecks.

Changes

  • Added a new Triton kernel for the GERU operation and updated the README to reflect the completion of the Triton implementation.

Reproduction

  • Not applicable as this is a new feature addition.

Tests

  • Existing tests were updated to include the new Triton kernel implementation.

Compatibility

  • No migration concerns or backwards compatibility issues identified.

Checklist

Enables in-place outer-product updates on matrices via a dedicated Triton kernel, allowing alpha-scaled vector outer products to update strided tiles while masking ragged edges.
@LoserCheems LoserCheems merged commit 952c4af into main Dec 10, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants