Skip to content

Tensor device mismatch error when using multi-GPU with device_map=auto #240

@qingying234

Description

@qingying234

When running the model microsoft/VibeVoice-ASR in a multi-GPU environment and setting the configuration parameter model_device=auto

python package:
transformers==4.57.6
accelerate==1.12.0

Log:
Model loaded successfully on cuda:0

================================================================================
Processing 1 audio(s)

============================================================
Processing batch 1/1

Processing batch of 1 audio(s)...
Input IDs shape: torch.Size([1, 15760])
Speech tensors shape: torch.Size([1, 50232329])
Attention mask shape: torch.Size([1, 15760])
Traceback (most recent call last):
File "/root/VibeVoice/demo/vibevoice_asr_inference_from_file.py", line 580, in
main()
File "/root/VibeVoice/demo/vibevoice_asr_inference_from_file.py", line 560, in main
all_results = asr.transcribe_with_batching(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/VibeVoice/demo/vibevoice_asr_inference_from_file.py", line 240, in transcribe_with_batching
batch_results = self.transcribe_batch(
^^^^^^^^^^^^^^^^^^^^^^
File "/root/VibeVoice/demo/vibevoice_asr_inference_from_file.py", line 158, in transcribe_batch
output_ids = self.model.generate(
^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2566, in generate
result = decoding_method(
^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2786, in _sample
outputs = self(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/accelerate/hooks.py", line 175, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vibevoice/modular/modeling_vibevoice_asr.py", line 375, in forward
speech_features = self.encode_speech(
^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vibevoice/modular/modeling_vibevoice_asr.py", line 335, in encode_speech
combined_features = acoustic_features[speech_masks] + semantic_features[speech_masks]
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:6)


My solution:
1
Image
2

Image

I'm not sure if this is the right way to fix it, but it runs.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions