-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Description
When running the model microsoft/VibeVoice-ASR in a multi-GPU environment and setting the configuration parameter model_device=auto
python package:
transformers==4.57.6
accelerate==1.12.0
Log:
Model loaded successfully on cuda:0
================================================================================
Processing 1 audio(s)
============================================================
Processing batch 1/1
Processing batch of 1 audio(s)...
Input IDs shape: torch.Size([1, 15760])
Speech tensors shape: torch.Size([1, 50232329])
Attention mask shape: torch.Size([1, 15760])
Traceback (most recent call last):
File "/root/VibeVoice/demo/vibevoice_asr_inference_from_file.py", line 580, in
main()
File "/root/VibeVoice/demo/vibevoice_asr_inference_from_file.py", line 560, in main
all_results = asr.transcribe_with_batching(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/VibeVoice/demo/vibevoice_asr_inference_from_file.py", line 240, in transcribe_with_batching
batch_results = self.transcribe_batch(
^^^^^^^^^^^^^^^^^^^^^^
File "/root/VibeVoice/demo/vibevoice_asr_inference_from_file.py", line 158, in transcribe_batch
output_ids = self.model.generate(
^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 120, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2566, in generate
result = decoding_method(
^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/transformers/generation/utils.py", line 2786, in _sample
outputs = self(**model_inputs, return_dict=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1775, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1786, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/accelerate/hooks.py", line 175, in new_forward
output = module._old_forward(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vibevoice/modular/modeling_vibevoice_asr.py", line 375, in forward
speech_features = self.encode_speech(
^^^^^^^^^^^^^^^^^^^
File "/opt/venv/lib/python3.12/site-packages/vibevoice/modular/modeling_vibevoice_asr.py", line 335, in encode_speech
combined_features = acoustic_features[speech_masks] + semantic_features[speech_masks]
~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cuda:6)
I'm not sure if this is the right way to fix it, but it runs.
