Skip to content
View yuekaizhang's full-sized avatar

Highlights

  • Pro

Block or report yuekaizhang

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don't include any personal information such as legal names or email addresses. Markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
yuekaizhang/README.md

Hi, I'm Yuekai Zhang

Speech AI @ NVIDIA. Working on training and deployment for speech models.

WeChat: ykzhang2020 | Email: zhangyuekai@foxmail.com


Pre-training

Recipe Description Link
Whisper Fine-tuning Fine-tune Whisper large-v2 on multi Chinese datasets using icefall icefall/whisper
Speech LLM Whisper encoder + Qwen2 LLM for Chinese ASR (Qwen-Audio style) icefall/ASR_LLM
Speech-to-Speech Qwen2.5-Omni-Like: Whisper → Adapter → Thinker LLM → Talker LLM → CosyVoice2 mair-hub/qwen_omni_like

Post-training

Direction Method Framework Link
Speech Understanding GRPO on Qwen2-Audio / Qwen2.5-Omni WeST west/examples/grpo
Speech Generation GRPO / DAPO on CosyVoice2 LLM (veRL + SenseVoice reward) veRL mair-hub/cosyvoice_llm

Deployment

All solutions use NVIDIA Triton Inference Server with Docker Compose quick-start.

ASR

Model Backend Streaming Link
Whisper TensorRT-LLM sherpa/triton/whisper
Fun-ASR-Nano vLLM Fun-ASR-vllm
FireRedASR-AED TensorRT-LLM FireRedASR/triton_tensorrt
FireRedASR2-AED TensorRT-LLM FireRedASR2S/triton_tensorrt
SenseVoice / Paraformer ONNX FunASR/triton_gpu
Conformer (WeNet) ONNX Yes wenet/runtime/gpu
Zipformer (Transducer) TensorRT / ONNX Yes sherpa/triton

TTS

Model Type Streaming Link
F5-TTS Diffusion F5-TTS/triton_trtllm
Spark-TTS LLM Yes Spark-TTS/triton_trtllm
CosyVoice 2 LLM + Diffusion Yes CosyVoice/triton_trtllm
ZipVoice Diffusion (distilled) ZipVoice/nvidia_triton

Tools

Tool Description Link
Triton-ASR-Client Benchmarking client with CER/WER eval, streaming & concurrency support Triton-ASR-Client
Triton-OpenAI-Speech OpenAI-compatible /v1/audio/speech API for Triton TTS backends Triton-OpenAI-Speech

Pinned Loading

  1. nvidia-china-sae/mair-hub nvidia-china-sae/mair-hub Public

    Jupyter Notebook 80 17

  2. Triton-ASR-Client Triton-ASR-Client Public

    ASR client for Triton ASR Service

    Python 37 8

  3. Triton-OpenAI-Speech Triton-OpenAI-Speech Public

    OpenAI-Compatible Frontend for Nvidia Triton Inference ASR/TTS Server

    Python 22 1

  4. Awesome-AudioLM-Datasets Awesome-AudioLM-Datasets Public

    9

  5. minutes minutes Public

    Podcast Summarizer with LLM Technology

    Python 30 6

  6. Fun-ASR-vllm Fun-ASR-vllm Public

    Forked from FunAudioLLM/Fun-ASR

    Fun-ASR is an end-to-end speech recognition large model launched by Tongyi Lab.

    Python 73 6