Skip to content

Inconsistencies in the audio RTP node configuration #63

@GaijinKa

Description

@GaijinKa

Description

There are a couple of inconsistencies in the audio RTP node configuration that should be addressed:

1. Channel configuration mismatch

The FFmpeg pipe output channels are explicitly defined, but the _get_waveform method subsequently converts the audio to mono regardless of the channel configuration. This creates unnecessary complexity:

  • If the pipeline always processes mono audio, the channels parameter should be hardcoded to 1 and the stereo-to-mono conversion logic in _get_waveform can be removed.

Suggestions

Hardcode channels=1 throughout

2. Hardcoded audio codec

The codec is currently hardcoded to pcm_s16le, which works well for Whisper. However, other models might require different audio formats (e.g., different sample formats or encodings).

Suggestions

Consider making the codec configurable to support future model integrations


⚠️ Important: While this task is straightforward, it has the potential to introduce breaking changes. We must:

  • Maintain pcm_s16le as the default codec
  • Ensure the channels parameter is consistently set to 1 throughout the node codebase

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions