Skip to content

layzdonw/transcribeserver

Repository files navigation

🎧 LazyAudio Transcribe Service

一个跨平台的实时/离线语音转录服务,支持 macOSWindows 客户端。
后端提供统一的 ASR API 服务(基于 Whisper 模型),前端通过 WebSocket 或 HTTP SSE 获取转录结果。


✨ 功能概览

  • WebSocket 实时转录
    • 客户端推送 PCM (16kHz mono float32)
    • 服务端返回 partial(临时稿)和 final(定稿)事件
  • HTTP SSE 离线转录
    • 上传音频文件
    • 流式返回转录结果
  • 健康检查
    • /health 接口返回 "ok",用于监控和探活
  • 跨平台客户端
    • macOS → Swift 原生 App
    • Windows → Tauri (Rust + 前端框架)

🐍 后端实现 (Python)

依赖安装

# Python 3.10+
pip install fastapi uvicorn[standard] numpy soundfile

# 版本1: faster-whisper (推荐)
pip install faster-whisper

# 版本2: whisper.cpp 绑定
pip install whispercpp

版本1: 基于 faster-whisper

文件: server_fastwhisper.py

import json, asyncio, contextlib, os, time, tempfile
import numpy as np
from fastapi import FastAPI, WebSocket, UploadFile, File, Form
from fastapi.responses import PlainTextResponse, StreamingResponse
from faster_whisper import WhisperModel

ASR_MODEL_NAME = os.getenv("ASR_MODEL", "small")
model = WhisperModel(ASR_MODEL_NAME, device="auto", compute_type="int8_float16")

app = FastAPI()

# 健康检查
@app.get("/health")
def health():
    return PlainTextResponse("ok", status_code=200)

# HTTP SSE 离线接口
@app.post("/transcribe")
async def transcribe_file(file: UploadFile = File(...), lang: str = Form(default=None)):
    with tempfile.NamedTemporaryFile(delete=True) as tmp:
        tmp.write(await file.read())
        tmp.flush()
        segs, info = model.transcribe(tmp.name, language=lang, vad_filter=True)
        async def stream():
            for s in segs:
                evt = {"type":"final","start":s.start,"end":s.end,"text":s.text}
                yield f"event: final\ndata: {json.dumps(evt, ensure_ascii=False)}\n\n"
            yield "event: done\ndata: {}\n\n"
        return StreamingResponse(stream(), media_type="text/event-stream")

# WebSocket 实时接口 (partial + final)
@app.websocket("/ws")
async def ws_endpoint(ws: WebSocket):
    await ws.accept()
    await ws.send_text(json.dumps({"type":"ready"}))
    buf = np.zeros(0, dtype=np.float32)
    committed_end = 0.0

    try:
        while True:
            msg = await ws.receive()
            if "bytes" in msg:
                pcm = np.frombuffer(msg["bytes"], dtype=np.float32)
                buf = np.concatenate([buf, pcm])
                segs, _ = model.transcribe(buf, beam_size=1)
                for s in segs:
                    if s.end > committed_end + 1e-3:
                        await ws.send_text(json.dumps({
                            "type":"final","start":s.start,"end":s.end,"text":s.text
                        }, ensure_ascii=False))
                        committed_end = s.end
                # 最后一个片段发 partial
                if segs:
                    last = segs[-1]
                    await ws.send_text(json.dumps({
                        "type":"partial","start":last.start,"end":last.end,"text":last.text
                    }, ensure_ascii=False))
    except:
        await ws.close()

运行:

uvicorn server_fastwhisper:app --host 0.0.0.0 --port 8000

版本2: 基于 whisper.cpp

文件: server_whispercpp.py

import json, io
import numpy as np
import soundfile as sf
from fastapi import FastAPI, WebSocket, UploadFile, File, Form
from fastapi.responses import PlainTextResponse, StreamingResponse
from whispercpp import Whisper

model = Whisper("ggml-small.bin")
app = FastAPI()

@app.get("/health")
def health():
    return PlainTextResponse("ok", status_code=200)

@app.post("/transcribe")
async def transcribe_file(file: UploadFile = File(...), lang: str = Form(default=None)):
    audio = await file.read()
    data, sr = sf.read(io.BytesIO(audio))
    res = model.transcribe(data, language=lang)
    async def stream():
        for seg in res["segments"]:
            yield f"event: final\ndata: {json.dumps(seg)}\n\n"
        yield "event: done\ndata: {}\n\n"
    return StreamingResponse(stream(), media_type="text/event-stream")

@app.websocket("/ws")
async def ws_endpoint(ws: WebSocket):
    await ws.accept()
    await ws.send_text(json.dumps({"type":"ready"}))
    try:
        while True:
            msg = await ws.receive()
            if "bytes" in msg:
                audio = np.frombuffer(msg["bytes"], dtype=np.float32)
                res = model.transcribe(audio)
                for seg in res["segments"]:
                    await ws.send_text(json.dumps({"type":"final","text":seg['text']}))
    except:
        await ws.close()

⚖️ Partial / Final 策略

  • partial
    • 最新一段,未收口
    • 会被覆盖更新
    • UI 可用灰色/斜体显示
  • final
    • 静音 ≥ 0.3–0.6s 或文本结尾出现停顿符号 → 定稿
    • 永不再改
    • UI 转为黑色/锁定

客户端逻辑:

  • 收到 partial → 覆盖同一 segmentId
  • 收到 final → 覆盖并锁定

🪟 Windows 客户端 (Tauri)

初始化

cargo install create-tauri-app
create-tauri-app lazyaudio --template vanilla
cd lazyaudio

src/main.rs

#![cfg_attr(not(debug_assertions), windows_subsystem = "windows")]

use tauri::Manager;

#[tauri::command]
async fn transcribe_file(path: String) -> Result<String, String> {
    let client = reqwest::Client::new();
    let form = reqwest::multipart::Form::new()
        .file("file", path).map_err(|e| e.to_string())?;
    let resp = client.post("http://127.0.0.1:8000/transcribe")
        .multipart(form)
        .send().await.map_err(|e| e.to_string())?;
    Ok(resp.text().await.unwrap_or_default())
}

fn main() {
    tauri::Builder::default()
        .invoke_handler(tauri::generate_handler![transcribe_file])
        .run(tauri::generate_context!())
        .expect("error while running tauri application");
}

前端调用

import { invoke } from '@tauri-apps/api';

async function runTranscribe() {
  const result = await invoke("transcribe_file", { path: "C:/test.wav" });
  console.log(result);
}

🚀 注意事项

  1. 实时接口固定收 16kHz mono float32 PCM,客户端需预处理。
  2. 延迟与稳定性权衡
    • tick 越短,实时性↑,CPU 占用↑;
    • overlap 越长,稳定性↑,延迟↑。
  3. VAD 阈值需要根据实际噪声调试。
  4. 断线重连:客户端掉线要新建 session,否则缓冲状态会丢。
  5. 部署:建议用 Docker + uvicorn/gunicorn,前面挂 Nginx 做负载。

🧩 与 Swift 一体打包

  1. 构建虚拟环境:在构建阶段执行 python3 -m venv build/venv,随后 build/venv/bin/pip install -r requirements.txt faster-whisper(或 whispercpp)。
  2. 复制资源:将 build/venv/transcribe_server/、以及需要的模型文件一并打包到 Swift App 的 Resources/python/ 或首次启动时解压到 ~/Library/Application Support/LazyAudioTranscribe
  3. Swift 启动子进程:使用 Process 指向虚拟环境中的 python 执行 -m transcribe_server --engine fast-whisper --port 8765,通过 ASR_MODELSERVER_HOST 环境变量控制配置。
  4. 模型目录:若要手动管理模型/VAD,设置 ASR_MODEL_DIR 指向缓存目录,ASR_VAD_MODEL 指向 silero_vad.onnx;也可在 CLI 中传 --model-dir--vad-model
  5. 日志与善后:用 Pipe 捕获标准输出/错误,应用退出时调用 task.terminate() 并等待进程结束,确保后端不会残留。
  6. 纯二进制替代(可选):如需减小资源,可用 pyinstaller -F transcribe_server/cli.py 预编译为单一可执行文件,再由 Swift 直接调用。

🧪 独立运行指南

  1. 准备环境:
    python3 -m venv .venv
    source .venv/bin/activate
    pip install -r requirements.txt
    pip install faster-whisper  # 或 whispercpp soundfile
  2. 启动服务:
    .venv/bin/python -m transcribe_server --engine fast-whisper --host 127.0.0.1 --port 8765 \
        --model-dir ./models/whisper --vad-model ./models/vad/silero_vad.onnx
    如需 whisper.cpp,设置 ASR_MODEL=/path/to/ggml-small.bin 并将 --engine 换成 whisper.cpp
  3. 旧写法兼容:仍可使用 uvicorn server_fastwhisper:appuvicorn server_whispercpp:app 启动,指向模块级 app
  4. 验证:访问 http://127.0.0.1:8765/health 检查返回的 engine 字段,再测试 /transcribe/ws 接口。

📌 总结

  • 后端:FastAPI + Whisper(faster-whisper 或 whisper.cpp),统一提供 /ws/transcribe/health
  • 协议:实时 partial/final,离线 SSE final/done
  • 前端
    • macOS → Swift
    • Windows → Tauri

About

Golang Based LocalTranscribe Server

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages