-
Notifications
You must be signed in to change notification settings - Fork 1.8k
Open
Description
是否已有关于该错误的issue或讨论? | Is there an existing issue / discussion for this?
- 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions
该问题是否在FAQ中有解答? | Is there an existing answer for this in FAQ?
- 我已经搜索过FAQ | I have searched FAQ
当前行为 | Current Behavior
MiniCPM-o-4_5 TTS 功能异常报告
环境信息
- 平台: Tencent Cloud Studio (Ubuntu)
- GPU: NVIDIA A10 (24GB)
- CUDA: 12.8
- Python: 3.11.1
- torch: 2.8.0+cu128
- torchaudio: 2.8.0+cu128
- transformers: 4.51.0
- onnxruntime-gpu: 1.24.1
- minicpmo-utils: 1.0.4
问题描述
TTS 语音合成功能无法正常生成音频文件。文本生成正常,但音频文件始终不生成。
错误信息
File "cosyvoice/cli/frontend.py", line 93, in _extract_speech_token assert speech.shape[1] / 16000 <= 30, 'do not support extract speech token for audio longer than 30s' AttributeError: 'NoneType' object has no attribute 'shape'
复现步骤
- 加载模型:
init_tts=True - 调用
model.init_tts(streaming=False) - 使用
model.chat()并设置generate_audio=True - 音频文件不生成,报错如上
其他功能测试结果
- ✅ 纯文本对话 - 正常
- ✅ 图片理解 - 正常
- ✅ 语音识别 (ASR) - 正常
- ✅ 视频理解 - 正常
- ❌ 语音合成 (TTS) - 失败
附件
- diagnostic_report.json
期望行为 | Expected Behavior
No response
复现方法 | Steps To Reproduce
复现步骤
- 加载模型:
init_tts=True - 调用
model.init_tts(streaming=False) - 使用
model.chat()并设置generate_audio=True - 音频文件不生成,报错如上
运行环境 | Environment
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):备注 | Anything else?
No response
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels