GitHub

VoiceClassifier

A real-time voice classification tool built in Python.

Overview

VoiceClassifier uses audio input from your microphone to extract speaker embeddings (via SpeechBrain) and classify/recognize voices in real time. It is useful for speaker identification and noise-cancelling.
Key features:

Live audio capture using sounddevice
Embedding extraction using torch, torchaudio transforms, and SpeechBrain models
Cosine-similarity-based classification with a simple interface
CLI arguments for ease of use

Libraries Used

This project leverages the following libraries:

DeepFilterNet: For real-time noise suppression.
SpeechBrain: For speaker embedding extraction and other speech processing tasks.
Torchaudio: For audio transformations and preprocessing.
Sounddevice: For capturing and playing audio in real time.
PyTorch: As the core deep learning framework.

Installation

1. Clone the repository:

git clone https://github.com/F1xxs/VoiceClassifier.git
cd VoiceClassifier

2. Install dependencies:

pip install -r requirements.txt

Usage

Before running the live voice classification, you first need to create a speaker embedding - a numerical "voiceprint" that represents the unique features of a person’s voice.
This is done using the embedding.py script.

Run the script on one or more voice samples:

python embedding.py audio_files

Or run it on a folder:

python embedding.py ./my_voice_samples/ -o embedding.pt

You can provide multiple audio files to improve accuracy - the script will average their embeddings into one voiceprint.
Note: All input files must be:

Mono (1 channel)
48kHz sample rate
either .wav or .ogg format

Then run the main script:

python VoiceClassifier.py [options]

Common options:

--list : List available audio devices and exit
--embedding : Path to the speaker embedding file
--low-threshold : Low threshold for gate hysteresis
--high-threshold : High threshold for gate hysteresis
--chunk-duration : The duration of each chunk processing in seconds
--show-score : Print the score of each chunk

Example:

python VoiceClassifier.py --chunk-duration 0.8 --show-score --embedding embedding.pt

Once running, the tool will ask you to select the input and output devices. After selection, it listens to the input device, denoises the audio, computes embeddings, compares them against known speaker embeddings, and sends the processed audio to the output device if the score is above the threshold.

Prerequisites & Notes

Python 3.8+ recommended
Requires a working microphone
GPU acceleration is optional (CPU is supported)

License

This project is licensed under the MIT License (LICENSE or http://opensource.org/licenses/MIT).

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
df		df
LICENSE		LICENSE
README.md		README.md
VoiceClassifier.py		VoiceClassifier.py
embedding.py		embedding.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VoiceClassifier

Overview

Libraries Used

Installation

Usage

Prerequisites & Notes

License

About

Uh oh!

Releases

Packages

Languages

License

F1xxs/VoiceClassifier

Folders and files

Latest commit

History

Repository files navigation

VoiceClassifier

Overview

Libraries Used

Installation

Usage

Prerequisites & Notes

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages