Skip to content

[BUG] <title>[BUG] TTS audio generation fails with AttributeError in _extract_speech_token #1063

@weklyy

Description

@weklyy

是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?

  • 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?

  • 我已经搜索过FAQ | I have searched FAQ

当前行为 | Current Behavior

MiniCPM-o-4_5 TTS 功能异常报告

环境信息

  • 平台: Tencent Cloud Studio (Ubuntu)
  • GPU: NVIDIA A10 (24GB)
  • CUDA: 12.8
  • Python: 3.11.1
  • torch: 2.8.0+cu128
  • torchaudio: 2.8.0+cu128
  • transformers: 4.51.0
  • onnxruntime-gpu: 1.24.1
  • minicpmo-utils: 1.0.4

问题描述

TTS 语音合成功能无法正常生成音频文件。文本生成正常,但音频文件始终不生成。

错误信息

File "cosyvoice/cli/frontend.py", line 93, in _extract_speech_token assert speech.shape[1] / 16000 <= 30, 'do not support extract speech token for audio longer than 30s' AttributeError: 'NoneType' object has no attribute 'shape'

复现步骤

  1. 加载模型: init_tts=True
  2. 调用 model.init_tts(streaming=False)
  3. 使用 model.chat() 并设置 generate_audio=True
  4. 音频文件不生成,报错如上

其他功能测试结果

  • ✅ 纯文本对话 - 正常
  • ✅ 图片理解 - 正常
  • ✅ 语音识别 (ASR) - 正常
  • ✅ 视频理解 - 正常
  • ❌ 语音合成 (TTS) - 失败

附件

  • diagnostic_report.json

diagnostic_report.json

期望行为 | Expected Behavior

No response

复现方法 | Steps To Reproduce

复现步骤

  1. 加载模型: init_tts=True
  2. 调用 model.init_tts(streaming=False)
  3. 使用 model.chat() 并设置 generate_audio=True
  4. 音频文件不生成,报错如上

运行环境 | Environment

- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):

备注 | Anything else?

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions