-
Notifications
You must be signed in to change notification settings - Fork 6
Open
Labels
Description
Description
There are a couple of inconsistencies in the audio RTP node configuration that should be addressed:
1. Channel configuration mismatch
The FFmpeg pipe output channels are explicitly defined, but the _get_waveform method subsequently converts the audio to mono regardless of the channel configuration. This creates unnecessary complexity:
- If the pipeline always processes mono audio, the
channelsparameter should be hardcoded to1and the stereo-to-mono conversion logic in_get_waveformcan be removed.
Suggestions
Hardcode channels=1 throughout
2. Hardcoded audio codec
The codec is currently hardcoded to pcm_s16le, which works well for Whisper. However, other models might require different audio formats (e.g., different sample formats or encodings).
Suggestions
Consider making the codec configurable to support future model integrations
- Maintain
pcm_s16leas the default codec - Ensure the
channelsparameter is consistently set to1throughout the node codebase