Skip to content

[WebRTC_Demo] Duplex audio can be silent/choppy due to int16 resample path in backend #98

@chatsci

Description

@chatsci

Summary

In demo/web_demo/WebRTC_Demo, duplex mode can show normal subtitles/state transitions but produce no audible output, or very choppy output (sentence split into discontinuous fragments).

Environment

  • macOS (Apple Silicon), M1 Max, 64GB RAM
  • WebRTC_Demo started via bash oneclick.sh start
  • model: openbmb/MiniCPM-o-4_5-gguf
  • LiveKit + backend + cpp server all healthy

Symptoms

  • Frontend receives <state><audio_start>, <state><generate_end>, subtitles are correct
  • Browser side shows remote audio track attached, but sound is missing or highly discontinuous
  • C++ logs show generation succeeded and WAV chunks sent

Example (from logs):

  • total generation time ~4.7s
  • total audio duration ~2.0s
  • RTF ~2.35x
  • only a few chunks sent

Root Cause (confirmed locally)

In backend audio path, int16 PCM is directly fed into scipy.signal.resample_poly(...) in voice_chat/omni_stream.py. In this path, resampled output can become near-zero/all-zero for some inputs, so LiveKit receives effectively silent frames.

Additionally, coarse chunking + queue underflow + aggressive generate_end/play_end transition can make output choppy.

Minimal Fix

Before resampling:

  • Convert source audio to normalized float32 in [-1, 1]
  • Run resample_poly on float32
  • Convert back to int16 after resample

This fixed the silent-audio issue immediately in local verification.

Suggested Code Locations

  • WebRTC_Demo/WebRTC_Demo/omini_backend_code/code/voice_chat/omni_stream.py
    • model wav -> WebRTC path
    • local TTS wav -> WebRTC path
    • any helper resampling functions

Optional Improvements for Choppy Playback

  • reduce output chunk size (finer granularity)
  • avoid queue starvation in output_audio
  • delay play_end decision to avoid premature turn-end during short gaps

Repro Steps

  1. cd demo/web_demo/WebRTC_Demo
  2. bash oneclick.sh start
  3. open frontend URL, start duplex voice conversation
  4. observe subtitle/state normal but audio silent/choppy

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions