Skip to content

Conversation

@klei22
Copy link
Collaborator

@klei22 klei22 commented Feb 3, 2026

This pull request introduces support for adding L2-normalized Gaussian noise to token embeddings as a configurable regularization technique. The changes allow users to specify the noise scale during both training and inference, and to sweep over different noise values in experiments. The implementation ensures noise is consistently applied in all relevant embedding pathways and tracked in evaluation outputs.

Embedding Gaussian Noise Regularization:

  • Added embedding_gaussian_noise_std as a new configuration parameter in GPTConfig, CLI arguments (train_args.py, sample.py), and experiment sweep YAML to control the scale of Gaussian noise applied to token embeddings. [1] [2] [3] [4]
  • Implemented the add_embedding_gaussian_noise method in model.py to inject L2-normalized Gaussian noise into token embeddings, and integrated this method into all embedding lookup pathways (forward, embed_tokens). [1] [2] [3] [4]

Inference and Evaluation Enhancements:

  • Enabled overriding the embedding noise scale at inference time via a new CLI argument in sample.py, ensuring flexibility for evaluation experiments. [1] [2]
  • Updated evaluation output to record the effective noise scale and optionally write results to a specified directory.

Experimentation Support:

  • Added embedding_gaussian_noise_std sweep values to embedding_gaussian_noise_sweep.yaml for systematic experimentation.

@klei22 klei22 requested a review from gkielian February 3, 2026 08:37
@gkielian gkielian requested a review from anandppatel February 5, 2026 22:31
@gkielian gkielian merged commit 01561de into ReaLLMASIC:master Feb 5, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants