Skip to content

Respaired/Higgs_Codec_Extended

Repository files navigation

license tags
mit
codec
audio_tokenizer
audio_codec



Hugging Face 🤗

This is an on-going project. it is a modified version of Higgs-Boson audio tokenizer, you can fully train it. all scripts have been tested. a Few notes however:

  • this is not backward compatible with the original checkpoint (I think you can tweak it to be, but you have to adhere to Boson community license if you do.)

  • I highly recommend you to pretrain the model without the mel and adversarial setup first. it saves you a significant amount of compute, time and speed-up your convergence. raise the batch size as much as you can before the adversarial phase.

  • for the semantic teacher, I am using utter-project/mHuBERT-147 which has a good multilingual support. if you want the original setup you can change it in the config.

  • The loss weights and hyperparameters may not be ideal, feel free to play around with different values.

I will train a checkpoint on a larger enough dataset one of these days after figuring out a few things first. but the setup is solid.

NOTE: the none-ddp version seem to be more stable.

Training

python train_boson_mixed_precision.py --data_csv "yourdata.csv" \
                                      --config config.json --batch_size 42  \
                                      --use_mixed_precision \
                                      --use_discriminator

Simple Inference

take a look at the notebook

Batch inference

take a look at boson_codeit.py

Happy using / training (inshallah).

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published