[BUG] <title>[BUG] TTS audio generation fails with AttributeError in _extract_speech_token

### 是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

- [x] 我已经搜索过已有的issues和讨论 | I have searched the existing issues / discussions

### 该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

- [x] 我已经搜索过FAQ | I have searched FAQ

### 当前行为 | Current Behavior

# MiniCPM-o-4_5 TTS 功能异常报告

## 环境信息
- **平台**: Tencent Cloud Studio (Ubuntu)
- **GPU**: NVIDIA A10 (24GB)
- **CUDA**: 12.8
- **Python**: 3.11.1
- **torch**: 2.8.0+cu128
- **torchaudio**: 2.8.0+cu128
- **transformers**: 4.51.0
- **onnxruntime-gpu**: 1.24.1
- **minicpmo-utils**: 1.0.4

## 问题描述
TTS 语音合成功能无法正常生成音频文件。文本生成正常，但音频文件始终不生成。

## 错误信息
File "cosyvoice/cli/frontend.py", line 93, in _extract_speech_token assert speech.shape[1] / 16000 <= 30, 'do not support extract speech token for audio longer than 30s' AttributeError: 'NoneType' object has no attribute 'shape'

## 复现步骤
1. 加载模型: `init_tts=True`
2. 调用 `model.init_tts(streaming=False)`
3. 使用 `model.chat()` 并设置 `generate_audio=True`
4. 音频文件不生成，报错如上

## 其他功能测试结果
- ✅ 纯文本对话 - 正常
- ✅ 图片理解 - 正常
- ✅ 语音识别 (ASR) - 正常
- ✅ 视频理解 - 正常
- ❌ 语音合成 (TTS) - 失败

## 附件
- diagnostic_report.json

[diagnostic_report.json](https://github.com/user-attachments/files/25142125/diagnostic_report.json)

### 期望行为 | Expected Behavior

_No response_

### 复现方法 | Steps To Reproduce

## 复现步骤
1. 加载模型: `init_tts=True`
2. 调用 `model.init_tts(streaming=False)`
3. 使用 `model.chat()` 并设置 `generate_audio=True`
4. 音频文件不生成，报错如上

### 运行环境 | Environment

```Markdown
- OS:
- Python:
- Transformers:
- PyTorch:
- CUDA (`python -c 'import torch; print(torch.version.cuda)'`):
```

### 备注 | Anything else?

_No response_

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] <title>[BUG] TTS audio generation fails with AttributeError in _extract_speech_token #1063

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

MiniCPM-o-4_5 TTS 功能异常报告

环境信息

问题描述

错误信息

复现步骤

其他功能测试结果

附件

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

复现步骤

运行环境 | Environment

备注 | Anything else?

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[BUG] <title>[BUG] TTS audio generation fails with AttributeError in _extract_speech_token #1063

Description

是否已有关于该错误的issue或讨论？ | Is there an existing issue / discussion for this?

该问题是否在FAQ中有解答？ | Is there an existing answer for this in FAQ?

当前行为 | Current Behavior

MiniCPM-o-4_5 TTS 功能异常报告

环境信息

问题描述

错误信息

复现步骤

其他功能测试结果

附件

期望行为 | Expected Behavior

复现方法 | Steps To Reproduce

复现步骤

运行环境 | Environment

备注 | Anything else?

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions