Skip to content

Conversation

@yunzhongOvO
Copy link
Collaborator

  1. add more tuning space for AMD GPU
  2. remove unused load/store masks
  3. introduce loop unrolling in GQA scenario

@scxiao
Copy link
Contributor

scxiao commented Aug 20, 2024

Do you have some perf numbers for this version?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants