Skip to content

Conversation

@klei22
Copy link
Collaborator

@klei22 klei22 commented Oct 4, 2023

This implements softermax and provides it as an option within the model.py config.

Link for softermax paper:
https://arxiv.org/abs/2103.09301

Setting the default to still use conventional softmax which trains a little faster, since softermax does is not yet incorporated into the faster attention libraries.

This implements softermax and provides it as an option within the config.

Link for softermax paper:
https://arxiv.org/abs/2103.09301

Setting the default to still use conventional softmax, since softermax
does is not yet incorporated into faster attention like flashattention.
Copy link
Collaborator

@gkielian gkielian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@msaligane msaligane merged commit 770f0e4 into ReaLLMASIC:master Oct 4, 2023
gkielian added a commit that referenced this pull request Jan 26, 2024
Add Randomness Seed Parameter PR #1
gkielian pushed a commit that referenced this pull request Feb 12, 2024
gkielian pushed a commit that referenced this pull request Mar 4, 2024
gkielian pushed a commit that referenced this pull request Aug 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants