-
Notifications
You must be signed in to change notification settings - Fork 28
Add angular pertubation study #742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add angular pertubation study #742
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Extends the fake PTQ evaluation demo to add an “embedding-style” Gaussian vector-noise sweep, enabling side-by-side comparison of quantization-induced vs noise-induced weight distortions (loss + angle stats) on the minipile eval.
Changes:
- Add a new utility script to generate Gaussian-noised checkpoints for multiple noise magnitudes (alphas).
- Update the minipile vector-PTQ demo to run a noise sweep, evaluate each noisy checkpoint, and summarize results into CSV + plots.
- Simplify the demo by removing the per-tensor quantization sweep and focusing on per-vector + noise.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
quantizations/ptq/embedding_gaussian_noise_ckpt.py |
New CLI utility to create alpha-swept noisy checkpoints using per-vector (embedding-dim) noise. |
demos/fake_ptq_vector_eval_demo_minipile.sh |
Adds noise sweep generation/eval + summary/plotting updates; removes per-tensor sweep. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| alphas = parse_alpha_list(args.alphas) | ||
|
|
||
| ckpt_path = os.path.join(args.ckpt_dir, "ckpt.pt") | ||
| checkpoint = torch.load(ckpt_path, map_location="cpu") |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
torch.load(ckpt_path, ...) will unpickle arbitrary Python objects from the checkpoint. If these checkpoints can come from untrusted sources, this is a code-execution risk. Consider using weights_only=True (and/or a --weights-only/--no-weights-only CLI flag like other utilities) when you only need tensors.
| checkpoint = torch.load(ckpt_path, map_location="cpu") | |
| checkpoint = torch.load(ckpt_path, map_location="cpu", weights_only=True) |
|
|
||
| fig.tight_layout() | ||
|
|
||
| plot_path = os.path.join(summary_root, "quantization_eval_summary.png") |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The output image is still named quantization_eval_summary.png, but it now contains both quantization and Gaussian-noise sweep plots. Renaming the file (or making the name reflect both) would avoid confusion when browsing artifacts.
| plot_path = os.path.join(summary_root, "quantization_eval_summary.png") | |
| plot_path = os.path.join(summary_root, "quantization_and_noise_eval_summary.png") |
| alpha_tensor = torch.tensor(alphas, dtype=torch.float32) | ||
| noisy_state_dicts = [dict() for _ in alphas] | ||
| for key, value in state_dict.items(): | ||
| if not torch.is_tensor(value) or not torch.is_floating_point(value): | ||
| for idx in range(len(alphas)): | ||
| noisy_state_dicts[idx][key] = value | ||
| continue | ||
| outputs = apply_noise_per_vector( | ||
| value, alpha_tensor, embedding_dim, generator=generator | ||
| ) | ||
| if outputs is None: | ||
| for idx in range(len(alphas)): | ||
| noisy_state_dicts[idx][key] = value | ||
| continue | ||
| for idx, noisy in enumerate(outputs): | ||
| noisy_state_dicts[idx][key] = noisy |
Copilot
AI
Feb 8, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build_noisy_state_dicts materializes a full copy of the model state for every alpha (and apply_noise_to_vectors builds an [num_alphas, ...] tensor per weight). For large checkpoints this scales memory ~O(num_alphas * model_size) and can easily OOM. Consider generating and saving one alpha at a time (or streaming tensors per-alpha) to keep peak memory closer to O(model_size), while still reusing a single sampled noise direction per tensor if you want comparable sweeps.
| alpha_tensor = torch.tensor(alphas, dtype=torch.float32) | |
| noisy_state_dicts = [dict() for _ in alphas] | |
| for key, value in state_dict.items(): | |
| if not torch.is_tensor(value) or not torch.is_floating_point(value): | |
| for idx in range(len(alphas)): | |
| noisy_state_dicts[idx][key] = value | |
| continue | |
| outputs = apply_noise_per_vector( | |
| value, alpha_tensor, embedding_dim, generator=generator | |
| ) | |
| if outputs is None: | |
| for idx in range(len(alphas)): | |
| noisy_state_dicts[idx][key] = value | |
| continue | |
| for idx, noisy in enumerate(outputs): | |
| noisy_state_dicts[idx][key] = noisy | |
| """ | |
| Build a list of state dicts, each with Gaussian noise applied to embedding-style | |
| vectors, one per alpha. | |
| This implementation avoids materializing an intermediate tensor of shape | |
| [num_alphas, ...] per weight, keeping peak memory closer to O(model_size) | |
| while still reusing a single sampled noise direction per tensor. | |
| """ | |
| # Pre-allocate one dict per alpha. | |
| noisy_state_dicts = [dict() for _ in alphas] | |
| # Iterate over all parameters in the original state dict. | |
| for key, value in state_dict.items(): | |
| # Non-tensor or non-floating values are copied verbatim to all alphas. | |
| if not torch.is_tensor(value) or not torch.is_floating_point(value): | |
| for idx in range(len(alphas)): | |
| noisy_state_dicts[idx][key] = value | |
| continue | |
| tensor = value | |
| # Determine if this tensor should be treated as a set of embedding vectors. | |
| # Case 1: embedding dimension is the last dimension. | |
| if tensor.ndim >= 1 and tensor.shape[-1] == embedding_dim: | |
| vectors = tensor | |
| def _restore_layout(x: torch.Tensor) -> torch.Tensor: | |
| # Layout is unchanged in this case. | |
| return x | |
| # Case 2: embedding dimension is the first dimension. | |
| elif tensor.ndim > 1 and tensor.shape[0] == embedding_dim: | |
| # Move embedding dimension to the last axis for consistent handling. | |
| vectors = torch.movedim(tensor, 0, -1) | |
| def _restore_layout(x: torch.Tensor) -> torch.Tensor: | |
| # Move embedding dimension back to the first axis. | |
| return torch.movedim(x, -1, 0) | |
| else: | |
| # Not an embedding-style tensor; copy verbatim. | |
| for idx in range(len(alphas)): | |
| noisy_state_dicts[idx][key] = value | |
| continue | |
| # Sample a single noise direction for this tensor shape. | |
| noise = torch.randn( | |
| vectors.shape, | |
| generator=generator, | |
| device=vectors.device, | |
| dtype=vectors.dtype, | |
| ) | |
| noise = noise / (_vector_norm(noise) + EPS) | |
| # Compute the norm of each vector so we can preserve it after perturbation. | |
| weight_norm = _vector_norm(vectors) | |
| # For each alpha, compute a perturbed version and store it directly. | |
| for idx, alpha in enumerate(alphas): | |
| alpha_tensor = torch.as_tensor(alpha, dtype=vectors.dtype, device=vectors.device) | |
| # scaled_noise = noise * alpha * weight_norm | |
| scaled_noise = noise * alpha_tensor.view(*([1] * vectors.ndim)) | |
| scaled_noise = scaled_noise * weight_norm | |
| # perturbed = vectors + scaled_noise | |
| perturbed = vectors + scaled_noise | |
| # Normalize perturbed vectors to keep their norm equal to weight_norm. | |
| perturbed_norm = _vector_norm(perturbed) | |
| perturbed = perturbed / (perturbed_norm + EPS) * weight_norm | |
| # Restore original layout if we had moved dimensions. | |
| noisy_tensor = _restore_layout(perturbed) | |
| noisy_state_dicts[idx][key] = noisy_tensor |

This pull request extends the fake PTQ evaluation demo to support embedding-style Gaussian noise sweeps, allowing for a direct comparison between quantization-induced and noise-induced distortions in model weights. The script now sweeps over different noise magnitudes (alphas), evaluates the perturbed checkpoints, and summarizes the results alongside vector quantization. Additionally, a new utility script,
embedding_gaussian_noise_ckpt.py, is introduced to generate noisy checkpoints.Major enhancements to evaluation pipeline:
The demo script (
demos/fake_ptq_vector_eval_demo_minipile.sh) now performs a sweep over embedding Gaussian noise magnitudes (alphas), generating noisy checkpoints, evaluating them, and collecting angle/loss statistics for each alpha. [1] [2] [3] [4] [5] [6]The summary and plotting logic is updated to include noise sweep results, producing both CSV summaries and comparison plots for quantization and noise sweeps. [1] [2]
Removal of per-tensor quantization sweep:
New utility for generating noisy checkpoints:
quantizations/ptq/embedding_gaussian_noise_ckpt.py, a standalone script that applies Gaussian noise to all weight vectors in a checkpoint, with support for multiple alphas and flexible checkpoint formats.Other improvements: