-
Notifications
You must be signed in to change notification settings - Fork 2.6k
Open
Description
I tried to run the 1.5B multi-speaker model on a 12GB GPU using a script longer than 10 minutes.
Expected behavior:
- The model generates audio for the entire script.
Actual behavior:
- The process crashes with a CUDA Out of Memory (OOM) error after a few minutes.
Steps to reproduce:
- Clone the VibeVoice repository.
- Install dependencies as per the instructions.
- Run inference with the 1.5B multi-speaker model on a script longer than 10 minutes.
Environment:
- OS: Ubuntu 22.04
- Python: 3.11
- CUDA: 12.1
- GPU: NVIDIA RTX 3060 12GB
- VibeVoice model: 1.5B multi-speaker
Additional notes:
- Reducing the script length allows the inference to succeed.
- Suggestion: maybe implement memory optimization for long-form audio generation.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels