Skip to content

Conversation

@CryVeck
Copy link

@CryVeck CryVeck commented Dec 18, 2024

The main modifications to support Llama 3.1 and 3.2:

  • In case of Llama 3.2 , tie_word_embedding=True so we need to do only one the rotation on input embedding as they are the same data has the output ones.
  • In case of Llama 3.2, as config.num_key_value_heads is different from config.num_attention_heads, we need to give the full formula : config.hidden_size * config.num_key_value_heads / config.num_attention_heads to get the right dimension.
  • Adding one Hadamard matrix

@yc2367
Copy link

yc2367 commented Jul 9, 2025

This is very helpful for my current work comparing QuaRot on Llama-3.2. Thank you very much! Would appreciate if the author can review and merge if applicable. @sashkboos

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants