Skip to content

Conversation

@klei22
Copy link
Collaborator

@klei22 klei22 commented Feb 8, 2026

This pull request extends the fake PTQ evaluation demo to support embedding-style Gaussian noise sweeps, allowing for a direct comparison between quantization-induced and noise-induced distortions in model weights. The script now sweeps over different noise magnitudes (alphas), evaluates the perturbed checkpoints, and summarizes the results alongside vector quantization. Additionally, a new utility script, embedding_gaussian_noise_ckpt.py, is introduced to generate noisy checkpoints.

Major enhancements to evaluation pipeline:

  • The demo script (demos/fake_ptq_vector_eval_demo_minipile.sh) now performs a sweep over embedding Gaussian noise magnitudes (alphas), generating noisy checkpoints, evaluating them, and collecting angle/loss statistics for each alpha. [1] [2] [3] [4] [5] [6]

  • The summary and plotting logic is updated to include noise sweep results, producing both CSV summaries and comparison plots for quantization and noise sweeps. [1] [2]

Removal of per-tensor quantization sweep:

  • The script and summary logic have been simplified to remove the per-tensor quantization sweep, focusing exclusively on per-vector quantization and noise perturbations. [1] [2] [3] [4]

New utility for generating noisy checkpoints:

  • Adds quantizations/ptq/embedding_gaussian_noise_ckpt.py, a standalone script that applies Gaussian noise to all weight vectors in a checkpoint, with support for multiple alphas and flexible checkpoint formats.

Other improvements:

  • The script now organizes output directories for noise sweeps and updates variable naming and directory management for clarity and extensibility. [1] [2]

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Extends the fake PTQ evaluation demo to add an “embedding-style” Gaussian vector-noise sweep, enabling side-by-side comparison of quantization-induced vs noise-induced weight distortions (loss + angle stats) on the minipile eval.

Changes:

  • Add a new utility script to generate Gaussian-noised checkpoints for multiple noise magnitudes (alphas).
  • Update the minipile vector-PTQ demo to run a noise sweep, evaluate each noisy checkpoint, and summarize results into CSV + plots.
  • Simplify the demo by removing the per-tensor quantization sweep and focusing on per-vector + noise.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
quantizations/ptq/embedding_gaussian_noise_ckpt.py New CLI utility to create alpha-swept noisy checkpoints using per-vector (embedding-dim) noise.
demos/fake_ptq_vector_eval_demo_minipile.sh Adds noise sweep generation/eval + summary/plotting updates; removes per-tensor sweep.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

alphas = parse_alpha_list(args.alphas)

ckpt_path = os.path.join(args.ckpt_dir, "ckpt.pt")
checkpoint = torch.load(ckpt_path, map_location="cpu")
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

torch.load(ckpt_path, ...) will unpickle arbitrary Python objects from the checkpoint. If these checkpoints can come from untrusted sources, this is a code-execution risk. Consider using weights_only=True (and/or a --weights-only/--no-weights-only CLI flag like other utilities) when you only need tensors.

Suggested change
checkpoint = torch.load(ckpt_path, map_location="cpu")
checkpoint = torch.load(ckpt_path, map_location="cpu", weights_only=True)

Copilot uses AI. Check for mistakes.

fig.tight_layout()

plot_path = os.path.join(summary_root, "quantization_eval_summary.png")
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output image is still named quantization_eval_summary.png, but it now contains both quantization and Gaussian-noise sweep plots. Renaming the file (or making the name reflect both) would avoid confusion when browsing artifacts.

Suggested change
plot_path = os.path.join(summary_root, "quantization_eval_summary.png")
plot_path = os.path.join(summary_root, "quantization_and_noise_eval_summary.png")

Copilot uses AI. Check for mistakes.
Comment on lines +161 to +176
alpha_tensor = torch.tensor(alphas, dtype=torch.float32)
noisy_state_dicts = [dict() for _ in alphas]
for key, value in state_dict.items():
if not torch.is_tensor(value) or not torch.is_floating_point(value):
for idx in range(len(alphas)):
noisy_state_dicts[idx][key] = value
continue
outputs = apply_noise_per_vector(
value, alpha_tensor, embedding_dim, generator=generator
)
if outputs is None:
for idx in range(len(alphas)):
noisy_state_dicts[idx][key] = value
continue
for idx, noisy in enumerate(outputs):
noisy_state_dicts[idx][key] = noisy
Copy link

Copilot AI Feb 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

build_noisy_state_dicts materializes a full copy of the model state for every alpha (and apply_noise_to_vectors builds an [num_alphas, ...] tensor per weight). For large checkpoints this scales memory ~O(num_alphas * model_size) and can easily OOM. Consider generating and saving one alpha at a time (or streaming tensors per-alpha) to keep peak memory closer to O(model_size), while still reusing a single sampled noise direction per tensor if you want comparable sweeps.

Suggested change
alpha_tensor = torch.tensor(alphas, dtype=torch.float32)
noisy_state_dicts = [dict() for _ in alphas]
for key, value in state_dict.items():
if not torch.is_tensor(value) or not torch.is_floating_point(value):
for idx in range(len(alphas)):
noisy_state_dicts[idx][key] = value
continue
outputs = apply_noise_per_vector(
value, alpha_tensor, embedding_dim, generator=generator
)
if outputs is None:
for idx in range(len(alphas)):
noisy_state_dicts[idx][key] = value
continue
for idx, noisy in enumerate(outputs):
noisy_state_dicts[idx][key] = noisy
"""
Build a list of state dicts, each with Gaussian noise applied to embedding-style
vectors, one per alpha.
This implementation avoids materializing an intermediate tensor of shape
[num_alphas, ...] per weight, keeping peak memory closer to O(model_size)
while still reusing a single sampled noise direction per tensor.
"""
# Pre-allocate one dict per alpha.
noisy_state_dicts = [dict() for _ in alphas]
# Iterate over all parameters in the original state dict.
for key, value in state_dict.items():
# Non-tensor or non-floating values are copied verbatim to all alphas.
if not torch.is_tensor(value) or not torch.is_floating_point(value):
for idx in range(len(alphas)):
noisy_state_dicts[idx][key] = value
continue
tensor = value
# Determine if this tensor should be treated as a set of embedding vectors.
# Case 1: embedding dimension is the last dimension.
if tensor.ndim >= 1 and tensor.shape[-1] == embedding_dim:
vectors = tensor
def _restore_layout(x: torch.Tensor) -> torch.Tensor:
# Layout is unchanged in this case.
return x
# Case 2: embedding dimension is the first dimension.
elif tensor.ndim > 1 and tensor.shape[0] == embedding_dim:
# Move embedding dimension to the last axis for consistent handling.
vectors = torch.movedim(tensor, 0, -1)
def _restore_layout(x: torch.Tensor) -> torch.Tensor:
# Move embedding dimension back to the first axis.
return torch.movedim(x, -1, 0)
else:
# Not an embedding-style tensor; copy verbatim.
for idx in range(len(alphas)):
noisy_state_dicts[idx][key] = value
continue
# Sample a single noise direction for this tensor shape.
noise = torch.randn(
vectors.shape,
generator=generator,
device=vectors.device,
dtype=vectors.dtype,
)
noise = noise / (_vector_norm(noise) + EPS)
# Compute the norm of each vector so we can preserve it after perturbation.
weight_norm = _vector_norm(vectors)
# For each alpha, compute a perturbed version and store it directly.
for idx, alpha in enumerate(alphas):
alpha_tensor = torch.as_tensor(alpha, dtype=vectors.dtype, device=vectors.device)
# scaled_noise = noise * alpha * weight_norm
scaled_noise = noise * alpha_tensor.view(*([1] * vectors.ndim))
scaled_noise = scaled_noise * weight_norm
# perturbed = vectors + scaled_noise
perturbed = vectors + scaled_noise
# Normalize perturbed vectors to keep their norm equal to weight_norm.
perturbed_norm = _vector_norm(perturbed)
perturbed = perturbed / (perturbed_norm + EPS) * weight_norm
# Restore original layout if we had moved dimensions.
noisy_tensor = _restore_layout(perturbed)
noisy_state_dicts[idx][key] = noisy_tensor

Copilot uses AI. Check for mistakes.
@klei22
Copy link
Collaborator Author

klei22 commented Feb 8, 2026

quantization_eval_summary

@gkielian gkielian merged commit 9375faa into ReaLLMASIC:master Feb 9, 2026
16 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants