Skip to content

Conversation

@m08pvv
Copy link
Contributor

@m08pvv m08pvv commented Oct 31, 2025

Worked a bit with AudioBridge and it was painful to hear all crackling noises when multiple streams summed up, so I decided to implement an audio limiter. I took AGC2 from Google's WebRTC (BSD 3-clause license that allows modifications and usage as long as license file stays in repo) and translated it to C (didn't write in C since school, so it might be not clean enough), adapted it for AudioBridge and also added SSE 4.2 and AVX2 versions (with proper initialization from available at runtime).

Also added a parameter to AudioBridge: use_limiter (boolean, default=false), which enables limiter. If limiter is disabled - only clamping (saturation to int16 boundaries) applied.

Here are some results of many participants speaking (5 participants speaking and the record is from incoming ab track):
image

Top - current Janus release with AudioBridge
Bottom - version with limiter and use_limiter = true
Full-spectrum lines - artifacts (crackling noise and clipping)

As you can see, current Janus AB implementation (just sum up all tracks) results in artifacts (values outside int16 range) whereas limiter reduces volume dynamically to normal level.

How it works:

  • On ab initialization we set references to the most efficient (from available) implementation (scalar, sse 4.2, avx2)
  • For each segment of audio:
    • If we mix 2 or more tracks (recording 2+ or talking 3+) - calculate scaling factors
    • If we record 2+ tracks mixed - scale and clamp, otherwise just clamp
    • For each participant:
      • If there are 2 or more tracks besides this participant - scale and clamp, otherwise just clamp

Usage:
By default limiter is disabled and only clamping is applied.
Set use_limiter property in create audiobridge request to true to enable limiter.

Any suggestions are welcome!

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 7, 2025

@whatisor, I'll check demo (maybe I missed some configuration and it simply doesn't turn limiter on).
On the picture at the top of this thread you can see exactly 5 loud persons (actually one person, me, but 5 tabs of an app opened). I create audiobridge via create request and it works for me, but the purpose of this PR is to make it work for everyone, so I'll check what could go wrong with demo.

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 12, 2025

@whatisor I took a default janus.plugin.audiobridge.jcfg.sample, added use_limiter = true and launched the audiobridge demo.
Everything works fine. Could you provide settings of your audiobridge room that you used in demo? It would be also great if you switch log level to 5 and attach Janus' logs.

Copy link
Member

@lminiero lminiero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did a first initial review. I'm not sure I 100% understand the logic, so I added a few comments inline.

It's also not clear to me if the limiter expects to be working with a specific sampling rate: your code doesn't seem to think it does, but I have some doubts. For RNNoise, for instance, we always upsample to 48000 if the room isn't 48000 before denoising, and then downsample to the room sampling rate after denoising, because RNNoise does expect a 48000 sampling rate: it will do something for lower sampling rates, but that something will be broken (e.g., excessive denoising) because it will use more samples than it should due to the lower sampling rate. I'm wondering if the limiter works the same way, because if it does it means the same kind of upsampling would need to happen there too.

@@ -0,0 +1,60 @@
#include <opus/opus.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need this file here: it's not compiled and it's not used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

@@ -0,0 +1,118 @@
#include <opus/opus.h>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

deleted

* pass a new \c rtp object with \c ip and \c port in a \c configure
* request on the participant handle, which will finalize the media
* establishment.
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this whole section is needed: the license file is already in the right folder, and it doesn't add anything in terms of description. The new parameter being listed in the create request documentation is enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

#include "../utils.h"
#include "../ip-utils.h"

#include "audiobridge-deps/limiter/limiter.h"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please move this above, where our other audiobridge-deps are, after the endif. Just as for the jitter buffer, it would be helpful to have a 1-2 lines comment that explains why the include is there, e.g.

We also ship our own version of libwebrtc's audio limiter, which can be optionally enabled when mixing rooms

Or something along those lines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

moved next to other deps and added comment


/* Setting function pointers according to runtime vectorization support (AVX2, SSE4.2 or scalar) */
init_limiter();
JANUS_LOG(LOG_INFO, "Initialized optimized limiter implementation\n");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just per code style, this log line should be before the call to the function, and "Initialized" should be "Initializing", to say what we're about to do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

JANUS_LOG(LOG_INFO, "Audio limiter enabled\n");
per_sample_scaling_factors = g_malloc0((audiobridge->spatial_audio ? OPUS_SAMPLES*2 : OPUS_SAMPLES) * sizeof(float));
envelope = g_malloc0(K_SUB_FRAMES_IN_FRAME * sizeof(float));
scaling_factors = g_malloc0((K_SUB_FRAMES_IN_FRAME + 1) * sizeof(float));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are all these just allocated and set to 0? Doesn't it matter what value they have, considering they're passed to compute_scaling_factors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for noticing, actually we need only envelope to be set to 0 (as previous values used in next calculations and initial 0 is needed), for others replaced with g_malloc

}
}
/* If we use limiter we should initialize it if we have more than 2 tracks mixed or we have more than 1 track mixed and recording */
if (audiobridge->use_limiter && (mix_count > 2 || (audiobridge->recording && mix_count > 1)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does it matter whether we're recording or not? A recorder doesn't contribute to the mix, and neither do forwarders.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In current implementation I tried to also optimize the case when there are only 2 participants in audiobridge and there is no record => no need to use limiter (we don't sum tracks if it's 1-1 conversation). But if there is a record (summing up 2 or more tracks), we need to calculate scaling factors because we'll use limiter.

/* If we use limiter we should initialize it if we have more than 2 tracks mixed or we have more than 1 track mixed and recording */
if (audiobridge->use_limiter && (mix_count > 2 || (audiobridge->recording && mix_count > 1)))
compute_scaling_factors(buffer, envelope, scaling_factors, per_sample_scaling_factors,
samples_in_sub_frame, &filter_state_level, &last_scaling_factor);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code style: indentation is broken here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

scale_buffer(buffer, samples, per_sample_scaling_factors, fullMixBuffer);
} else {
/* Otherwise just clamp values that are outside int16 boundaries */
clamp_buffer(buffer, samples, fullMixBuffer);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I understand the difference and why we have this branch here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we don't use limiter then we can saturate values (instead of truncation) when converting from opus_int32 to opus_int16, because truncation can cause overflow and it affects sound much more than saturation to int16 limits.

}
mixBuffer = outBuffer;
} else if (!has_silent) {
JANUS_LOG(LOG_WARN, "[%s] No packet for participant (%s), but we have no silent participants either\n", audiobridge->room_id_str, p->user_id_str);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a temporary debug line? This looks like something that could spam the logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

commented out (can remove, but it might be needed later for debugging)

@whatisor
Copy link

@whatisor I took a default janus.plugin.audiobridge.jcfg.sample, added use_limiter = true and launched the audiobridge demo. Everything works fine. Could you provide settings of your audiobridge room that you used in demo? It would be also great if you switch log level to 5 and attach Janus' logs.

@m08pvv Thank you for your quick response. I confirm it works with denoise : false. I got issue because of denoise: true feature(maybe you wanna look at that in the future)
I am looking forward this PR released.

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 13, 2025

@whatisor thanks for the information! I'll definitely take a look because I'll also use denoise true in some cases.

@lminiero
Copy link
Member

I'm trying to better understand the scope for this effort, and after giving it some thought, I think that this patch may be trying to do the right thing but in the wrong place.

If what we want is to prevent saturation on the mix itself, then the patch in #3601 is probably enough, but what we want a limiter to do is "limit" the contribution of each individual participant, especially noisy ones, which probably means the limiter code should be applied to the audio we decode from participants (so before the mix) rather than on the full mix itself. This would be in line with how the denoiser works, in fact: for denoising too, it's individual participants we denoise before mixing, rather than denoising a mix after the fact, so that we end up with a mix that is as clean as possible. By limiting individual participants, I assume we'd have more or less normalized contrbutions, and in case their sum goes too loud, #3601 kicks in to stay within the -0.9/0.9 limit. This would also allow us to selectively choose which participants should be limited, if any, with a room default that can be used in case no property is explicitly set for participants (again, which is how denoising works too).

Does this make sense? What do you think @m08pvv @whatisor ?

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 13, 2025

@lminiero in current master the resulting mix is a sum, so if we have 5 speakers (A, B, C, D, E) then:
recording is A+B+C+D+E
participant A receives B+C+D+E
...
participant E receives A+B+C+D

We assume that all tracks (A, B, C, D, E) do not exceed 100% volume (otherwise the audio is already broken).
The sum of 4 tracks will be in range [0, 400%] of volume and sum of 5 tracks will be in [0, 500%], so we need to reduce resulting tracks at least in 4 and 5 times respectively.
The main idea was that we can calculate one set of scale factors (for the whole mix, in this case it's A+B+C+D+E) and then apply it to each resulting track (A+B+C+D, ..., B+C+D+E), so the audiobridge would not introduce the x4-x5 gain if all participants speaking.

Normalizing every track individually would be N times more CPU-intensive and usually the volume of individual speakers adjusted by volume_gain (at least I thought so).

So, if calculating N set of factors and applying them to each track individually is ok (shouldn't take much CPU, but it's a hot path in audiobridge) - then I can do so, but the sum of tracks should be normalized anyway because otherwise even if we have 5 participants that are in range [0, 50%] - resulting tracks would be [0, 200%] and [0, 250%] for participants and recording respectively.

@lminiero
Copy link
Member

Yeah I get all that, but my point is that if we have 5 participants whose sum gets to 500%, we already have that other PR that brings the mix down to 90% (instead of cutting to 100% as we did before), since we're using 32-bit samples for mixing so we're not actually losing anything. That should already be equivalent to a gain reduction in each sample. I'm personally more interested in limiting individual participants that may be too loud, as what's most boresome in my experience is not the actual sum of things, but the relative dominance of some speakers over others due to their potentially different individual gains.

@lminiero
Copy link
Member

Normalizing every track individually would be N times more CPU-intensive and usually the volume of individual speakers adjusted by volume_gain (at least I thought so).

It's true that we have the volume gain for that, but that's manual. I was thinking of something automated, which is what I assumed a limiter could bring to the table. That would pretty much be the same as audio compression (which is what I assume any limiter implements, more or less).

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 13, 2025

  1. In Fix overflow mixing #3601 the sum is not scaled to 0.9 it's just clamped at that level, so if we have e.g. a sine wave - we will get almost a square wave (meander), which is not what we want I guess. So then it needs to also scale it somehow before or after sum.

@lminiero
Copy link
Member

even if we have 5 participants that are in range [0, 50%] - resulting tracks would be [0, 200%] and [0, 250%] for participants and recording respectively.

I think this is where we may have a source of confusion. It's true that when you mix many contributions you can exceed the 100%, but that's exactly why we use an int32_t for the mixing buffer: that is to allow a sum even if it goes beyond the limits, without losing anything. When we then snap those samples back to int16_t, the more basic code in that other PR brings it back to max 90% no matter what it was, which is more or less equivalent to a gain reduction. I guess that a better way to do that would be to figure out much the loudest sample exceeds the mix, and apply the same division to all samples in those 20ms, but it's short enough snippets that I don't think it would matter much.

Anyway, I'll need to find some simple ways to test both patches, possibly using some automated participants in the AudioBridge that provide some talking. Once I do that I'll have a clearer picture of the whole thing.

@lminiero
Copy link
Member

  1. In Fix overflow mixing #3601 the sum is not scaled to 0.9 it's just clamped at that level, so if we have e.g. a sine wave - we will get almost a square wave (meander), which is not what we want I guess. So then it needs to also scale it somehow before or after sum.

Yep, makes sense, which is what I mentioned in a subsequent comment too. I'll need to make some tests.

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 13, 2025

Just for visual representation of clipping:
image
If we just limit min and max values without actual scaling, we get clipping artifacts.

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 14, 2025

@whatisor what rnnoise version do you use? I built from f6662ad41f5bf7bf244967a04e95334c81e5af4c with 55+MB rnnoise_data.tar.gz and I have zero clipping even using default demo's 16kHz audiobridge room.

@lminiero
Copy link
Member

I still need to prepare a test with a few automated AudioBridge participants for testing purposes, but I was thinking that maybe we could achieve the same kind of result in a "simpler" way, and without the additional dependency:

  1. we mix as we do know (sum to an int32 array)
  2. once we have a sum of all participants, find the "loudest" sample, and see what negative gain needs to be applied to stay within the -0.9/0.9 range (if any: we may be in the range already)
  3. apply that gain to the mix and save it to a different array, that we'll use for all passive participants (non-talkers, recording, forwarders)
  4. for participants that are talking, as we do now, remove their contribution and do the same thing (find the loudest sample, find the gain, apply the gain to the mix for that participant)

This way we don't need to worry about how many people are talking, what's silence or not, since we always get a mix that is loud enough for everyone involved, whether they're active or not. Most importantly, we don't needlessly bring the whole mix down any time more than one person talks, since (if I understood our patch correctly) as your patch reduces the volume of active participants depending on how many there are, with 4 people you decrease each of their volumes to ~25%, which will get the mix for non-active participants to ~100%, but active speakers will get a much quieter mix (75%, since their contribution will not be a part of that). Besides, silence detection as it's done is tricky, as simple background noise will not be detected as silence but will be low enough not to contribute anything meaningful and still bring the whole volume down (people talking will suddenly be quieter if someone's unmuted but not really saying anything).

Anyway, this was just me thinking of a possible alternative solution to a problem that does exist, but that's not to say that it will be THE solution. I'll need to setup some testbed to make some tests (with different fake people speaking, as me alone wouldn't cut it), and once I have it I can prepare a tentative patch that implements my idea and I can do some comparisons.

@lminiero
Copy link
Member

@m08pvv @whatisor I prepared a first version of my proposed idea in the PR above. I did a few tests using a bunch of fake participants sending pre-recorded talking audio (plus my ugly and noisy voice) and it seems to be doing something: if I look at the recording, it seems to be doing its job, but of course I may be wrong. Considering both of you have made some tests with multiple participants, could you let me know if this more or less does what you need it to? Thanks!

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 19, 2025

This (adapted from WebRTC) implementation is an AGC (Adaptive Gain Control), so it uses piecewise linear interpolation with pre-calculated gain curves, providing precise control over different signal levels and it includes attack/decay filtering that prevents abrupt gain changes, avoiding audible artifacts. Simple peak scaling can introduce pumping or breathing effects.
This limiter calculates per-sample scaling factors for smooth transitions, especially important during attack phases.
Yes it uses sophisticated envelope detection with lookahead to anticipate signal peaks, but it allows for more graceful gain reduction compared to reactive peak detection. It also divides audio into sub-frames for more granular control, enabling better transient preservation and reduced distortion.
The gain curves are specifically designed to minimize harmonic distortion while maintaining signal integrity, based on extensive psychoacoustic research (done by WebRTC team).

And yes, this code we already run in production and it sounds great in calls with 15 people.

spscream pushed a commit to spscream/janus-gateway that referenced this pull request Nov 19, 2025
@whatisor
Copy link

ce we're using 32-bit samples for

I am using https://github.com/xiph/rnnoise/releases/tag/v0.2

@m08pvv
Copy link
Contributor Author

m08pvv commented Nov 24, 2025

@whatisor, I tried 0.2 (with ~21 MB model) and it sounds terrible, try latest master of rnnoise (it might fade-in beginnings of phrases but at least it doesn't produce noise)

@m08pvv
Copy link
Contributor Author

m08pvv commented Dec 17, 2025

@lminiero, did you have a chance to look at the code?

@lminiero
Copy link
Member

Not yet, sorry... if not in the next few days, I'll definitely make this a priority as soon as I get back to the office after the holidays.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants