Skip to content

Inference fails with CUDA out of memory on long scripts #157

@rachana192837

Description

@rachana192837

I tried to run the 1.5B multi-speaker model on a 12GB GPU using a script longer than 10 minutes.

Expected behavior:

  • The model generates audio for the entire script.

Actual behavior:

  • The process crashes with a CUDA Out of Memory (OOM) error after a few minutes.

Steps to reproduce:

  1. Clone the VibeVoice repository.
  2. Install dependencies as per the instructions.
  3. Run inference with the 1.5B multi-speaker model on a script longer than 10 minutes.

Environment:

  • OS: Ubuntu 22.04
  • Python: 3.11
  • CUDA: 12.1
  • GPU: NVIDIA RTX 3060 12GB
  • VibeVoice model: 1.5B multi-speaker

Additional notes:

  • Reducing the script length allows the inference to succeed.
  • Suggestion: maybe implement memory optimization for long-form audio generation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions