-
Notifications
You must be signed in to change notification settings - Fork 2.6k
AudioBridge audio limiter #3593
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
|
@whatisor, I'll check demo (maybe I missed some configuration and it simply doesn't turn limiter on). |
|
@whatisor I took a default |
lminiero
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did a first initial review. I'm not sure I 100% understand the logic, so I added a few comments inline.
It's also not clear to me if the limiter expects to be working with a specific sampling rate: your code doesn't seem to think it does, but I have some doubts. For RNNoise, for instance, we always upsample to 48000 if the room isn't 48000 before denoising, and then downsample to the room sampling rate after denoising, because RNNoise does expect a 48000 sampling rate: it will do something for lower sampling rates, but that something will be broken (e.g., excessive denoising) because it will use more samples than it should due to the lower sampling rate. I'm wondering if the limiter works the same way, because if it does it means the same kind of upsampling would need to happen there too.
| @@ -0,0 +1,60 @@ | |||
| #include <opus/opus.h> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this file here: it's not compiled and it's not used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted
| @@ -0,0 +1,118 @@ | |||
| #include <opus/opus.h> | |||
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleted
src/plugins/janus_audiobridge.c
Outdated
| * pass a new \c rtp object with \c ip and \c port in a \c configure | ||
| * request on the participant handle, which will finalize the media | ||
| * establishment. | ||
| * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this whole section is needed: the license file is already in the right folder, and it doesn't add anything in terms of description. The new parameter being listed in the create request documentation is enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
src/plugins/janus_audiobridge.c
Outdated
| #include "../utils.h" | ||
| #include "../ip-utils.h" | ||
|
|
||
| #include "audiobridge-deps/limiter/limiter.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please move this above, where our other audiobridge-deps are, after the endif. Just as for the jitter buffer, it would be helpful to have a 1-2 lines comment that explains why the include is there, e.g.
We also ship our own version of libwebrtc's audio limiter, which can be optionally enabled when mixing rooms
Or something along those lines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
moved next to other deps and added comment
src/plugins/janus_audiobridge.c
Outdated
|
|
||
| /* Setting function pointers according to runtime vectorization support (AVX2, SSE4.2 or scalar) */ | ||
| init_limiter(); | ||
| JANUS_LOG(LOG_INFO, "Initialized optimized limiter implementation\n"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just per code style, this log line should be before the call to the function, and "Initialized" should be "Initializing", to say what we're about to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/plugins/janus_audiobridge.c
Outdated
| JANUS_LOG(LOG_INFO, "Audio limiter enabled\n"); | ||
| per_sample_scaling_factors = g_malloc0((audiobridge->spatial_audio ? OPUS_SAMPLES*2 : OPUS_SAMPLES) * sizeof(float)); | ||
| envelope = g_malloc0(K_SUB_FRAMES_IN_FRAME * sizeof(float)); | ||
| scaling_factors = g_malloc0((K_SUB_FRAMES_IN_FRAME + 1) * sizeof(float)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all these just allocated and set to 0? Doesn't it matter what value they have, considering they're passed to compute_scaling_factors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for noticing, actually we need only envelope to be set to 0 (as previous values used in next calculations and initial 0 is needed), for others replaced with g_malloc
src/plugins/janus_audiobridge.c
Outdated
| } | ||
| } | ||
| /* If we use limiter we should initialize it if we have more than 2 tracks mixed or we have more than 1 track mixed and recording */ | ||
| if (audiobridge->use_limiter && (mix_count > 2 || (audiobridge->recording && mix_count > 1))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does it matter whether we're recording or not? A recorder doesn't contribute to the mix, and neither do forwarders.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In current implementation I tried to also optimize the case when there are only 2 participants in audiobridge and there is no record => no need to use limiter (we don't sum tracks if it's 1-1 conversation). But if there is a record (summing up 2 or more tracks), we need to calculate scaling factors because we'll use limiter.
src/plugins/janus_audiobridge.c
Outdated
| /* If we use limiter we should initialize it if we have more than 2 tracks mixed or we have more than 1 track mixed and recording */ | ||
| if (audiobridge->use_limiter && (mix_count > 2 || (audiobridge->recording && mix_count > 1))) | ||
| compute_scaling_factors(buffer, envelope, scaling_factors, per_sample_scaling_factors, | ||
| samples_in_sub_frame, &filter_state_level, &last_scaling_factor); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code style: indentation is broken here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
src/plugins/janus_audiobridge.c
Outdated
| scale_buffer(buffer, samples, per_sample_scaling_factors, fullMixBuffer); | ||
| } else { | ||
| /* Otherwise just clamp values that are outside int16 boundaries */ | ||
| clamp_buffer(buffer, samples, fullMixBuffer); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I understand the difference and why we have this branch here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we don't use limiter then we can saturate values (instead of truncation) when converting from opus_int32 to opus_int16, because truncation can cause overflow and it affects sound much more than saturation to int16 limits.
src/plugins/janus_audiobridge.c
Outdated
| } | ||
| mixBuffer = outBuffer; | ||
| } else if (!has_silent) { | ||
| JANUS_LOG(LOG_WARN, "[%s] No packet for participant (%s), but we have no silent participants either\n", audiobridge->room_id_str, p->user_id_str); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this just a temporary debug line? This looks like something that could spam the logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
commented out (can remove, but it might be needed later for debugging)
@m08pvv Thank you for your quick response. I confirm it works with denoise : false. I got issue because of denoise: true feature(maybe you wanna look at that in the future) |
|
@whatisor thanks for the information! I'll definitely take a look because I'll also use |
|
I'm trying to better understand the scope for this effort, and after giving it some thought, I think that this patch may be trying to do the right thing but in the wrong place. If what we want is to prevent saturation on the mix itself, then the patch in #3601 is probably enough, but what we want a limiter to do is "limit" the contribution of each individual participant, especially noisy ones, which probably means the limiter code should be applied to the audio we decode from participants (so before the mix) rather than on the full mix itself. This would be in line with how the denoiser works, in fact: for denoising too, it's individual participants we denoise before mixing, rather than denoising a mix after the fact, so that we end up with a mix that is as clean as possible. By limiting individual participants, I assume we'd have more or less normalized contrbutions, and in case their sum goes too loud, #3601 kicks in to stay within the -0.9/0.9 limit. This would also allow us to selectively choose which participants should be limited, if any, with a room default that can be used in case no property is explicitly set for participants (again, which is how denoising works too). |
|
@lminiero in current master the resulting mix is a sum, so if we have 5 speakers (A, B, C, D, E) then: We assume that all tracks (A, B, C, D, E) do not exceed 100% volume (otherwise the audio is already broken). Normalizing every track individually would be N times more CPU-intensive and usually the volume of individual speakers adjusted by volume_gain (at least I thought so). So, if calculating N set of factors and applying them to each track individually is ok (shouldn't take much CPU, but it's a hot path in audiobridge) - then I can do so, but the sum of tracks should be normalized anyway because otherwise even if we have 5 participants that are in range [0, 50%] - resulting tracks would be [0, 200%] and [0, 250%] for participants and recording respectively. |
|
Yeah I get all that, but my point is that if we have 5 participants whose sum gets to 500%, we already have that other PR that brings the mix down to 90% (instead of cutting to 100% as we did before), since we're using 32-bit samples for mixing so we're not actually losing anything. That should already be equivalent to a gain reduction in each sample. I'm personally more interested in limiting individual participants that may be too loud, as what's most boresome in my experience is not the actual sum of things, but the relative dominance of some speakers over others due to their potentially different individual gains. |
It's true that we have the volume gain for that, but that's manual. I was thinking of something automated, which is what I assumed a limiter could bring to the table. That would pretty much be the same as audio compression (which is what I assume any limiter implements, more or less). |
|
I think this is where we may have a source of confusion. It's true that when you mix many contributions you can exceed the 100%, but that's exactly why we use an Anyway, I'll need to find some simple ways to test both patches, possibly using some automated participants in the AudioBridge that provide some talking. Once I do that I'll have a clearer picture of the whole thing. |
Yep, makes sense, which is what I mentioned in a subsequent comment too. I'll need to make some tests. |
|
@whatisor what rnnoise version do you use? I built from f6662ad41f5bf7bf244967a04e95334c81e5af4c with 55+MB rnnoise_data.tar.gz and I have zero clipping even using default demo's 16kHz audiobridge room. |
|
I still need to prepare a test with a few automated AudioBridge participants for testing purposes, but I was thinking that maybe we could achieve the same kind of result in a "simpler" way, and without the additional dependency:
This way we don't need to worry about how many people are talking, what's silence or not, since we always get a mix that is loud enough for everyone involved, whether they're active or not. Most importantly, we don't needlessly bring the whole mix down any time more than one person talks, since (if I understood our patch correctly) as your patch reduces the volume of active participants depending on how many there are, with 4 people you decrease each of their volumes to ~25%, which will get the mix for non-active participants to ~100%, but active speakers will get a much quieter mix (75%, since their contribution will not be a part of that). Besides, silence detection as it's done is tricky, as simple background noise will not be detected as silence but will be low enough not to contribute anything meaningful and still bring the whole volume down (people talking will suddenly be quieter if someone's unmuted but not really saying anything). Anyway, this was just me thinking of a possible alternative solution to a problem that does exist, but that's not to say that it will be THE solution. I'll need to setup some testbed to make some tests (with different fake people speaking, as me alone wouldn't cut it), and once I have it I can prepare a tentative patch that implements my idea and I can do some comparisons. |
|
@m08pvv @whatisor I prepared a first version of my proposed idea in the PR above. I did a few tests using a bunch of fake participants sending pre-recorded talking audio (plus my ugly and noisy voice) and it seems to be doing something: if I look at the recording, it seems to be doing its job, but of course I may be wrong. Considering both of you have made some tests with multiple participants, could you let me know if this more or less does what you need it to? Thanks! |
|
This (adapted from WebRTC) implementation is an AGC (Adaptive Gain Control), so it uses piecewise linear interpolation with pre-calculated gain curves, providing precise control over different signal levels and it includes attack/decay filtering that prevents abrupt gain changes, avoiding audible artifacts. Simple peak scaling can introduce pumping or breathing effects. And yes, this code we already run in production and it sounds great in calls with 15 people. |
I am using https://github.com/xiph/rnnoise/releases/tag/v0.2 |
|
@whatisor, I tried 0.2 (with ~21 MB model) and it sounds terrible, try latest master of rnnoise (it might fade-in beginnings of phrases but at least it doesn't produce noise) |
|
@lminiero, did you have a chance to look at the code? |
|
Not yet, sorry... if not in the next few days, I'll definitely make this a priority as soon as I get back to the office after the holidays. |

Worked a bit with AudioBridge and it was painful to hear all crackling noises when multiple streams summed up, so I decided to implement an audio limiter. I took AGC2 from Google's WebRTC (BSD 3-clause license that allows modifications and usage as long as license file stays in repo) and translated it to C (didn't write in C since school, so it might be not clean enough), adapted it for AudioBridge and also added SSE 4.2 and AVX2 versions (with proper initialization from available at runtime).
Also added a parameter to AudioBridge: use_limiter (boolean, default=false), which enables limiter. If limiter is disabled - only clamping (saturation to int16 boundaries) applied.
Here are some results of many participants speaking (5 participants speaking and the record is from incoming ab track):

Top - current Janus release with AudioBridge
Bottom - version with limiter and use_limiter = true
Full-spectrum lines - artifacts (crackling noise and clipping)
As you can see, current Janus AB implementation (just sum up all tracks) results in artifacts (values outside int16 range) whereas limiter reduces volume dynamically to normal level.
How it works:
Usage:
By default limiter is disabled and only clamping is applied.
Set
use_limiterproperty increateaudiobridge request totrueto enable limiter.Any suggestions are welcome!