Tensor device mismatch error when using multi-GPU with device_map=auto

When running the model microsoft/VibeVoice-ASR in a multi-GPU environment and setting the configuration parameter model_device=auto


python package:
transformers==4.57.6
accelerate==1.12.0



Log:
Model loaded successfully on cuda:0

================================================================================
Processing 1 audio(s)
================================================================================

============================================================
Processing batch 1/1

Processing batch of 1 audio(s)...
  Input IDs shape: torch.Size([1, 15760])
  Speech tensors shape: torch.Size([1, 50232329])
  Attention mask shape: torch.Size([1, 15760])
Traceback (most recent call last):
  File "/root/VibeVoice/demo/vibevoice_asr_inference_from_file.py", line 580, in <module>
    main()
  File "/root/VibeVoice/demo/vibevoice_asr_inference_from_file.py", line 560, in main
    all_results = asr.transcribe_with_batching(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/root/VibeVoice/demo/vibevoice_asr_inference_from_file.py", line 240, in transcribe_with_batching
    batch_results = self.transcribe_batch(
                    ^^^^^^^^^^^^^^^^^^^^^^
  File "/root/VibeVoice/demo/vibevoice_asr_inference_from_file.py", line 158, in transcribe_batch
    output_ids = self.model.generate(
                 ^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2566, in generate
    result = decoding_method(
             ^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2786, in _sample
    outputs = self(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/accelerate/hooks.py", line 175, in new_forward
    output = module._old_forward(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/vibevoice/modular/modeling_vibevoice_asr.py", line 375, in forward
    speech_features = self.encode_speech(
                      ^^^^^^^^^^^^^^^^^^^
  File "/opt/venv/lib/python3.12/site-packages/vibevoice/modular/modeling_vibevoice_asr.py", line 335, in encode_speech
    combined_features = acoustic_features[speech_masks] + semantic_features[speech_masks]
                        ~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:6)







-----------------------------------------------------------------------------------------------------------
My solution:
1 
<img width="2262" height="705" alt="Image" src="https://github.com/user-attachments/assets/2557f505-62d7-4574-a44d-99cf7e05ebfc" />
2

<img width="1447" height="284" alt="Image" src="https://github.com/user-attachments/assets/6c66ed31-ca3c-446c-a6b7-16ca08d7caef" />

I'm not sure if this is the right way to fix it, but it runs.



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tensor device mismatch error when using multi-GPU with device_map=auto #240

================================================================================
Processing 1 audio(s)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Tensor device mismatch error when using multi-GPU with device_map=auto #240

Description

================================================================================ Processing 1 audio(s)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

================================================================================
Processing 1 audio(s)