Skip to content

Conversation

Copy link
Contributor

Copilot AI commented Feb 9, 2026

Fix NotImplementedError with AffineQuantizedTensor on CUDA

Problem

After model quantization with torchao, moving the model to CUDA fails with NotImplementedError because direct .to(device) calls don't work with AffineQuantizedTensor on some torch versions.

Root Cause

The code uses direct .to(device) calls to move the model BEFORE quantization, but after torch.compile() and quantize_(), there's no explicit device movement using the safe _recursive_to_device() method that handles AffineQuantizedTensor properly.

Plan

  • Investigate the issue and understand the codebase
  • Identify root cause: direct .to() calls don't work with AffineQuantizedTensor
  • Replace direct .to(device) calls with _recursive_to_device() for model initialization
  • Ensure model is on correct device after quantization
  • Test the fix with quantized models
  • Run code review and security checks
  • Verify the solution works
Original prompt

This section details on the original issue you should resolve

<issue_title>BUG - commit #17825ee(?) (NotImplementedError related to AffineQuantizedTensor when attempting to move the quantized model to CUDA)</issue_title>
<issue_description>Describe the bug
After updating to the latest commit (7aa2737 - "Revert 'fix: implement platform-specific audio playback reset logic'"), music generation fails with a NotImplementedError related to AffineQuantizedTensor when attempting to move the quantized model to CUDA. The error occurs in torchao's quantization layer during the model.to(device) operation, specifically with the aten._has_compatible_shallow_copy_type operator not being implemented for AffineQuantizedTensor types.

Can confirm reverting to c3dcf14 resolves this issue. VERY sure the issue is introduced somewhere in 17825ee

To Reproduce
Steps to reproduce the behavior:

  • Fresh install of ACE-Step-1.5 on Ubuntu 24.04 with RTX 50 Series GPU (working ~8-10 hours ago)
  • Pull latest updates from repository (git pull)
  • Launch the Gradio UI
  • Attempt to generate music with any text prompt
  • See error: NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: func=<OpOverload(op='aten._has_compatible_shallow_copy_type', overload='default')>

Expected behavior
Music generation should proceed normally as it did before the update. The model should successfully move to CUDA device and generate audio output.

Desktop (please complete the following information):

  • OS: Ubuntu 24.04
  • GPU: NVIDIA RTX 50 Series
  • Python: 3.11.11
  • Installation: Fresh install less than 10 hours ago, was working until git pull update

Additional context
The issue appears to have been introduced in commit #17825ee ("Merge mainline commits as of 2026/02/08 05:16 UTC with MPS optimizations and do optimization checks"). The installation was fully functional approximately 8 hours ago before pulling the latest updates. The error specifically occurs when torchao's AffineQuantizedTensor (used for model quantization) attempts to be moved to CUDA, suggesting a compatibility issue between the quantization implementation and PyTorch's device transfer mechanisms.

2026-02-08 09:44:21.118 | INFO     | acestep.handler:_load_model_context:880 - [_load_model_context] Offloaded vae to CPU in 0.1504s
2026-02-08 09:44:21.119 | ERROR    | acestep.handler:generate_music:3509 - [generate_music] Generation failed
Traceback (most recent call last):

  File "/home/ubuntuuser/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
    │    └ <function Thread._bootstrap_inner at 0x7e0db48bf880>
    └ <WorkerThread(AnyIO worker thread, started daemon 138580555499200)>
  File "/home/ubuntuuser/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/threading.py", line 1045, in _bootstrap_inner
    self.run()
    │    └ <function WorkerThread.run at 0x7e09cd6d4680>
    └ <WorkerThread(AnyIO worker thread, started daemon 138580555499200)>
  File "/media/ubuntuuser/Encrypted1/ACE-Step-1.5/.venv/lib/python3.11/site-packages/anyio/_backends/_asyncio.py", line 986, in run
    result = context.run(func, *args)
             │       │   │      └ (<generator object setup_event_handlers.<locals>.generation_wrapper at 0x7e09d5e057e0>,)
             │       │   └ <function run_sync_iterator_async at 0x7e09d9805800>
             │       └ <method 'run' of '_contextvars.Context' objects>
             └ <_contextvars.Context object at 0x7e0960e66300>
  File "/media/ubuntuuser/Encrypted1/ACE-Step-1.5/.venv/lib/python3.11/site-packages/gradio/utils.py", line 835, in run_sync_iterator_async
    return next(iterator)
                └ <generator object setup_event_handlers.<locals>.generation_wrapper at 0x7e09d5e057e0>
  File "/media/ubuntuuser/Encrypted1/ACE-Step-1.5/.venv/lib/python3.11/site-packages/gradio/utils.py", line 1019, in gen_wrapper
    response = next(iterator)
                    └ <generator object setup_event_handlers.<locals>.generation_wrapper at 0x7e0960ec3ca0>

  File "/media/ubuntuuser/Encrypted1/ACE-Step-1.5/acestep/gradio_ui/events/__init__.py", line 539, in generation_wrapper
    yield from res_h.generate_with_batch_management(dit_handler, llm_handler, *args)
               │     │                              │            │             └ ('A brief, clean melodic phrase played on a bright, metallic mallet percussion instrument, resembling a xylophone or marimba....
               │     │                              │            └ <acestep.llm_inference.LLMHandler object at 0x7e09d785d650>
               │     │                              └ <acestep.handler.AceStepHandler object at 0x7e09d7a1ff50>
               │     └ ...

</details>



<!-- START COPILOT CODING AGENT SUFFIX -->

- Fixes ace-step/ACE-Step-1.5#334

<!-- START COPILOT CODING AGENT TIPS -->
---

💬 We'd love your input! Share your thoughts on Copilot coding agent in our [2 minute survey](https://gh.io/copilot-coding-agent-survey).

@ChuxiJ ChuxiJ closed this Feb 9, 2026
Copilot AI requested a review from ChuxiJ February 9, 2026 20:02
Copilot stopped work on behalf of ChuxiJ due to an error February 9, 2026 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants