一个跨平台的实时/离线语音转录服务,支持 macOS 与 Windows 客户端。
后端提供统一的 ASR API 服务(基于 Whisper 模型),前端通过 WebSocket 或 HTTP SSE 获取转录结果。
- WebSocket 实时转录
- 客户端推送 PCM (16kHz mono float32)
- 服务端返回
partial(临时稿)和final(定稿)事件
- HTTP SSE 离线转录
- 上传音频文件
- 流式返回转录结果
- 健康检查
/health接口返回"ok",用于监控和探活
- 跨平台客户端
- macOS → Swift 原生 App
- Windows → Tauri (Rust + 前端框架)
# Python 3.10+
pip install fastapi uvicorn[standard] numpy soundfile
# 版本1: faster-whisper (推荐)
pip install faster-whisper
# 版本2: whisper.cpp 绑定
pip install whispercpp文件: server_fastwhisper.py
import json, asyncio, contextlib, os, time, tempfile
import numpy as np
from fastapi import FastAPI, WebSocket, UploadFile, File, Form
from fastapi.responses import PlainTextResponse, StreamingResponse
from faster_whisper import WhisperModel
ASR_MODEL_NAME = os.getenv("ASR_MODEL", "small")
model = WhisperModel(ASR_MODEL_NAME, device="auto", compute_type="int8_float16")
app = FastAPI()
# 健康检查
@app.get("/health")
def health():
return PlainTextResponse("ok", status_code=200)
# HTTP SSE 离线接口
@app.post("/transcribe")
async def transcribe_file(file: UploadFile = File(...), lang: str = Form(default=None)):
with tempfile.NamedTemporaryFile(delete=True) as tmp:
tmp.write(await file.read())
tmp.flush()
segs, info = model.transcribe(tmp.name, language=lang, vad_filter=True)
async def stream():
for s in segs:
evt = {"type":"final","start":s.start,"end":s.end,"text":s.text}
yield f"event: final\ndata: {json.dumps(evt, ensure_ascii=False)}\n\n"
yield "event: done\ndata: {}\n\n"
return StreamingResponse(stream(), media_type="text/event-stream")
# WebSocket 实时接口 (partial + final)
@app.websocket("/ws")
async def ws_endpoint(ws: WebSocket):
await ws.accept()
await ws.send_text(json.dumps({"type":"ready"}))
buf = np.zeros(0, dtype=np.float32)
committed_end = 0.0
try:
while True:
msg = await ws.receive()
if "bytes" in msg:
pcm = np.frombuffer(msg["bytes"], dtype=np.float32)
buf = np.concatenate([buf, pcm])
segs, _ = model.transcribe(buf, beam_size=1)
for s in segs:
if s.end > committed_end + 1e-3:
await ws.send_text(json.dumps({
"type":"final","start":s.start,"end":s.end,"text":s.text
}, ensure_ascii=False))
committed_end = s.end
# 最后一个片段发 partial
if segs:
last = segs[-1]
await ws.send_text(json.dumps({
"type":"partial","start":last.start,"end":last.end,"text":last.text
}, ensure_ascii=False))
except:
await ws.close()运行:
uvicorn server_fastwhisper:app --host 0.0.0.0 --port 8000文件: server_whispercpp.py
import json, io
import numpy as np
import soundfile as sf
from fastapi import FastAPI, WebSocket, UploadFile, File, Form
from fastapi.responses import PlainTextResponse, StreamingResponse
from whispercpp import Whisper
model = Whisper("ggml-small.bin")
app = FastAPI()
@app.get("/health")
def health():
return PlainTextResponse("ok", status_code=200)
@app.post("/transcribe")
async def transcribe_file(file: UploadFile = File(...), lang: str = Form(default=None)):
audio = await file.read()
data, sr = sf.read(io.BytesIO(audio))
res = model.transcribe(data, language=lang)
async def stream():
for seg in res["segments"]:
yield f"event: final\ndata: {json.dumps(seg)}\n\n"
yield "event: done\ndata: {}\n\n"
return StreamingResponse(stream(), media_type="text/event-stream")
@app.websocket("/ws")
async def ws_endpoint(ws: WebSocket):
await ws.accept()
await ws.send_text(json.dumps({"type":"ready"}))
try:
while True:
msg = await ws.receive()
if "bytes" in msg:
audio = np.frombuffer(msg["bytes"], dtype=np.float32)
res = model.transcribe(audio)
for seg in res["segments"]:
await ws.send_text(json.dumps({"type":"final","text":seg['text']}))
except:
await ws.close()- partial
- 最新一段,未收口
- 会被覆盖更新
- UI 可用灰色/斜体显示
- final
- 静音 ≥ 0.3–0.6s 或文本结尾出现停顿符号 → 定稿
- 永不再改
- UI 转为黑色/锁定
客户端逻辑:
- 收到
partial→ 覆盖同一segmentId - 收到
final→ 覆盖并锁定
cargo install create-tauri-app
create-tauri-app lazyaudio --template vanilla
cd lazyaudio#![cfg_attr(not(debug_assertions), windows_subsystem = "windows")]
use tauri::Manager;
#[tauri::command]
async fn transcribe_file(path: String) -> Result<String, String> {
let client = reqwest::Client::new();
let form = reqwest::multipart::Form::new()
.file("file", path).map_err(|e| e.to_string())?;
let resp = client.post("http://127.0.0.1:8000/transcribe")
.multipart(form)
.send().await.map_err(|e| e.to_string())?;
Ok(resp.text().await.unwrap_or_default())
}
fn main() {
tauri::Builder::default()
.invoke_handler(tauri::generate_handler![transcribe_file])
.run(tauri::generate_context!())
.expect("error while running tauri application");
}import { invoke } from '@tauri-apps/api';
async function runTranscribe() {
const result = await invoke("transcribe_file", { path: "C:/test.wav" });
console.log(result);
}- 实时接口固定收
16kHz mono float32 PCM,客户端需预处理。 - 延迟与稳定性权衡:
- tick 越短,实时性↑,CPU 占用↑;
- overlap 越长,稳定性↑,延迟↑。
- VAD 阈值需要根据实际噪声调试。
- 断线重连:客户端掉线要新建 session,否则缓冲状态会丢。
- 部署:建议用 Docker + uvicorn/gunicorn,前面挂 Nginx 做负载。
- 构建虚拟环境:在构建阶段执行
python3 -m venv build/venv,随后build/venv/bin/pip install -r requirements.txt faster-whisper(或whispercpp)。 - 复制资源:将
build/venv/、transcribe_server/、以及需要的模型文件一并打包到 Swift App 的Resources/python/或首次启动时解压到~/Library/Application Support/LazyAudioTranscribe。 - Swift 启动子进程:使用
Process指向虚拟环境中的python执行-m transcribe_server --engine fast-whisper --port 8765,通过ASR_MODEL、SERVER_HOST环境变量控制配置。 - 模型目录:若要手动管理模型/VAD,设置
ASR_MODEL_DIR指向缓存目录,ASR_VAD_MODEL指向silero_vad.onnx;也可在 CLI 中传--model-dir、--vad-model。 - 日志与善后:用
Pipe捕获标准输出/错误,应用退出时调用task.terminate()并等待进程结束,确保后端不会残留。 - 纯二进制替代(可选):如需减小资源,可用
pyinstaller -F transcribe_server/cli.py预编译为单一可执行文件,再由 Swift 直接调用。
- 准备环境:
python3 -m venv .venv source .venv/bin/activate pip install -r requirements.txt pip install faster-whisper # 或 whispercpp soundfile
- 启动服务:
如需 whisper.cpp,设置
.venv/bin/python -m transcribe_server --engine fast-whisper --host 127.0.0.1 --port 8765 \ --model-dir ./models/whisper --vad-model ./models/vad/silero_vad.onnxASR_MODEL=/path/to/ggml-small.bin并将--engine换成whisper.cpp。 - 旧写法兼容:仍可使用
uvicorn server_fastwhisper:app或uvicorn server_whispercpp:app启动,指向模块级app。 - 验证:访问
http://127.0.0.1:8765/health检查返回的engine字段,再测试/transcribe和/ws接口。
- 后端:FastAPI + Whisper(faster-whisper 或 whisper.cpp),统一提供
/ws、/transcribe、/health - 协议:实时
partial/final,离线 SSEfinal/done - 前端:
- macOS → Swift
- Windows → Tauri