Skip to content

Getting Started

Coloride edited this page Apr 22, 2024 · 35 revisions

Introduction

very stupid oversimpilification of how voice chat works

OpenVoiceSharp will basically help you getting the annoying part of encoding and making all the basic voice chat stuff, while you can focus on making your game or app. BUT all things considered:

I will not annoy you with the boring details. All you gotta know is a few things:

  • PCM - Think of this as raw audio; it is not encoded in anything special and is used by your computer at a lower level. This is the data your microphone returns or other stuff.
  • Opus - This is a lossy audio format; it's an encoding applied to PCM that basically makes your packets smaller using algorithms. It is pretty much used everywhere when it comes to VoIP, even Discord.
  • Resampling - This is a technique that you might need to use if the engine/app or anything you use does not return the correct format to read the incoming audio from OpenVoiceSharp or write to your VoiceChatInterface.
  • RNNoise - This is the noise suppression algorithm; embedded in OpenVoiceSharp, and toggleable. It runs over the CPU, so it is up to you to know if you want it or not. It is enabled by default.
  • WebRTC VAD - This the voice activity detector; embedded in OpenVoiceSharp using WebRTC's VAD. It detects if someone is talking or not depending on the OperatingMode you've specified.
  • Stereo or Mono - This represents if your audio uses only one channel (Both) or two channels (Left and Right), this can vary the size of the buffer you can get (stereo is double the size since there are two channels).
  • 16 bit PCM format & float32 PCM format - This is how the raw PCM data is encoded. You will hear this mentioned a lot, but to begin without going in detail; both of them store waves. One has much more nuance (float32) and can contain more data precision (since its made out of floats) and the other (16 bit) has less data precision (since its made out of bytes or shorts). This is also why when converting from 16 bit PCM or float32 you need to make arrays double or half the size.

You do not have to fully understand how 16 bit PCM and float32 work but you need to know that they do basically the same thing, but some engines/apps prefer a format over another for precision or other reasons.

Warning

This package is not very beginner developer friendly. The package is not your typical battery-included asset store plugin or extension that does all the job for you in one click, sorry. Even though I have tried to cover examples for the broadest of cases, it assumes you know a bit about manipulating about stuff like float and binary arrays, optimization, multiplayer/networking or threading and a tiny bit of procedural audio knowledge and that you know how to use your game engine or app. But again, do not hesitate to ask for help (see below) if you need and read your engine's or app's documentation according to your need. Make sure to check the examples if you are confused.

Format

OpenVoiceSharp uses the RNNoise/Opus/WebRTC VAD format, meaning your input must be:

  • A 16 bit PCM byte[] array (if needed, the VoiceUtilities can help you in converting to or from float32)
  • At 48000Hz
  • Stereo or Mono (if needed, the VoiceChatInterface can force it to be Mono or Stereo according to your needs.)
  • A frame length of 20 ms

For recording, if you do not feel like handling recording, the BasicMicrophoneRecorder handles pretty much everything for you (using NAudio).

For playback, a BufferedWaveProvider class from NAudio exists, but if you use a game engine, it should have a native way of supporting raw PCM samples in real time.

If everything matches, you should be good to go. 👍

Do I need a certain library or engine to use OpenVoiceSharp?

No, but OpenVoiceSharp does not provide:

  • Anything to modify audio (effects, panning, soft clipping, etc)
  • A way to send binary packets over network
  • A super complex activity detection algorithm or noise suppression
  • Features such as groups, teams or muting
  • A native way to playback the audio samples (that's up to the engine/app you use)

Everything I said previously is up to you to figure out, which shouldn't be too hard depending on the engine or app you use/make. OpenVoiceSharp is meant to be as basic as possible.

How do I install OpenVoiceSharp?

Via Visual Studio

Right click on the solution you wish to install OpenVoiceSharp on and select "Manage NuGet Packages...".

Right Click on Solution

When the NuGet Package Manager page shows up, search for OpenVoiceSharp and click install.

NuGet Package Manager

Apply and accept the licenses (more info there).

Apply

Via the dotnet CLI

Open up a terminal in your project's folder and type the following:

dotnet add package openvoicesharp

Tip

By default, on .NET projects, the DLL files should be linked and automatically be in the project's output folder. If not, drag the necessary files in your project's output folder. Depending on the engine you use, I provide instructions in the guides.

How do I use OpenVoiceSharp?

For a very basic example of recording, encoding and decoding, check out Example.cs. Otherwise...

If you know what you're doing

To send/encode

  1. Record 20 ms of the microphone at 48000Hz (mono or stereo)

Note that this is taken care of easily using BasicMicrophoneRecorder.

  1. Encode to 16 bit PCM data, make sure to convert using VoiceUtilities if needed if your input is float32
  2. Send your encoded data and its length in the network (that is obviously up to you to figure out)

To receive/decode/playback

  1. Decode the opus data, this will give you a 16 bit PCM byte[] array.
  2. Make sure the format matches with your target engine/app. Most game engines take float32, but some take 16 bit PCM natively, so you'll have to look yourself and figure out, but again, make sure to convert using VoiceUtilities if needed if your output needs float32.
  3. Play it back... depending on your engine/app.

If you don't know what you're doing

I've made a basic implementation for most game engines out there along with an agnostic console example. Check them out in the sidebar.

However, I recommend you actually understand what I am telling you to do up there.

What?

The premise of voice chat is to send small bits of voice data over the network, however, that has a problem itself: it is uncompressed, which means that it can be pretty fat on the network and impact performance.

This is why we use Opus, to basically "compress/decompress" the packets then send or receive them. Think of it like a very fast real time 7zip or WinRAR, but for voice data.

OpenVoiceSharp doesn't just compress your packets, it also goes through Noise Suppression. Here's how it goes:

  • We convert from 16 bit PCM to float32 PCM
  • Process it through RNNoise
  • Convert it back to 16 bit PCM

Then eventually, we actually encode the packet through Opus. Then, you send the binary data through the network.

Great, now we're sending voice data... How do we even play that back?

This ultimately depends on the engine/app you use. Like I said earlier, I've tried to make examples for the most popular ones.

But here's how you find how to do it yourself, it comes down to 2 steps:

  • Find what format your game engine or app takes in here
  • Find a way to play that back in real time or "feed" the incoming data

And thats it.

Tip

11/10 times you should be able to find the format via the documentation yourself. So look for the right tags, BUT I've done most of that work for you here.

Need help?

Try contacting me, opening an issue or join the OpenVoiceSharp Discord server for help. Good luck!

Clone this wiki locally