Skip to content

Conversation

@CedricHwong
Copy link

@CedricHwong CedricHwong commented Dec 26, 2025

What does this PR do?

Type of change: Bug fix

Overview:

  • Synchronizes MSE calibration amax across distributed groups (DP/EP/TP) after calibration finishes.
  • Adds a multi‑GPU test that verifies amax values match when distributed_sync=True and differ when distributed_sync=False.

Usage

  import copy
  import modelopt.torch.quantization as mtq

  # Build a quantization config that uses MSE calibration
  cfg = copy.deepcopy(mtq.INT8_DEFAULT_CFG)
  cfg["algorithm"] = {
      "method": "mse",
      "distributed_sync": True,
  }
#Run quantization + calibration (forward_loop feeds calibration data)
  model = mtq.quantize(model, cfg, forward_loop)

Testing

PYTHONPATH=/root/epfs/workspace/code/personal_repos/Model-Optimizer pytest -q tests/gpu/torch/quantization/test_mse_calibrate_sync.py
      - Result: 3 passed, 1 skipped (skip: 1‑GPU case)

Additional Information

  • New test validates distributed amax synchronization for MSE calibration under NCCL.

@CedricHwong CedricHwong requested a review from a team as a code owner December 26, 2025 11:23
@CedricHwong CedricHwong requested a review from ajrasane December 26, 2025 11:23
@copy-pr-bot
Copy link

copy-pr-bot bot commented Dec 26, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: CedricHwong <997630814@qq.com>
Signed-off-by: CedricHwong <997630814@qq.com>
Signed-off-by: CedricHwong <997630814@qq.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant