From 1e264be66e41aff464b40b39a97167512d734cb7 Mon Sep 17 00:00:00 2001 From: Khushal Jethava Date: Fri, 24 Jan 2025 17:53:08 +0530 Subject: [PATCH 1/2] feat(Updated Google colab file): --- README copy.md | 154 +++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 154 insertions(+) create mode 100644 README copy.md diff --git a/README copy.md b/README copy.md new file mode 100644 index 0000000..12d3ee0 --- /dev/null +++ b/README copy.md @@ -0,0 +1,154 @@ +# bark.cpp + +![bark.cpp](./assets/banner.png) + +[![Actions Status](https://github.com/PABannier/bark.cpp/actions/workflows/build.yml/badge.svg)](https://github.com/PABannier/bark.cpp/actions) +[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT) + +[Roadmap](https://github.com/users/PABannier/projects/1) / [encodec.cpp](https://github.com/PABannier/encodec.cpp) / [ggml](https://github.com/ggerganov/ggml) + +Inference of [SunoAI's bark model](https://github.com/suno-ai/bark) in pure C/C++. + +## Description + +With `bark.cpp`, our goal is to bring **real-time realistic multilingual** text-to-speech generation to the community. + +- [x] Plain C/C++ implementation without dependencies +- [x] AVX, AVX2 and AVX512 for x86 architectures +- [x] CPU and GPU compatible backends +- [x] Mixed F16 / F32 precision +- [x] 4-bit, 5-bit and 8-bit integer quantization +- [x] Metal and CUDA backends + +**Models supported** + +- [x] [Bark Small](https://huggingface.co/suno/bark-small) +- [x] [Bark Large](https://huggingface.co/suno/bark) + +**Models we want to implement! Please open a PR :)** + +- [ ] [AudioCraft](https://audiocraft.metademolab.com/) ([#62](https://github.com/PABannier/bark.cpp/issues/62)) +- [ ] [AudioLDM2](https://audioldm.github.io/audioldm2/) ([#82](https://github.com/PABannier/bark.cpp/issues/82)) +- [ ] [Piper](https://github.com/rhasspy/piper) ([#135](https://github.com/PABannier/bark.cpp/issues/135)) + +Demo on [Google Colab](https://colab.research.google.com/drive/1j8osRVX4J_DAXMUDked7AgR9_cYGdPOz?usp=sharing) + +--- + +Here is a typical run using `bark.cpp`: + +```java +./main -p "This is an audio generated by bark.cpp" + + __ __ + / /_ ____ ______/ /__ _________ ____ + / __ \/ __ `/ ___/ //_/ / ___/ __ \/ __ \ + / /_/ / /_/ / / / ,< _ / /__/ /_/ / /_/ / +/_.___/\__,_/_/ /_/|_| (_) \___/ .___/ .___/ + /_/ /_/ + +bark_tokenize_input: prompt: 'This is an audio generated by bark.cpp' +bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20795 20172 20199 33733 58966 20203 28169 20222 + +Generating semantic tokens: 17% + +bark_print_statistics: sample time = 10.98 ms / 138 tokens +bark_print_statistics: predict time = 614.96 ms / 4.46 ms per token +bark_print_statistics: total time = 633.54 ms + +Generating coarse tokens: 100% + +bark_print_statistics: sample time = 3.75 ms / 410 tokens +bark_print_statistics: predict time = 3263.17 ms / 7.96 ms per token +bark_print_statistics: total time = 3274.00 ms + +Generating fine tokens: 100% + +bark_print_statistics: sample time = 38.82 ms / 6144 tokens +bark_print_statistics: predict time = 4729.86 ms / 0.77 ms per token +bark_print_statistics: total time = 4772.92 ms + +write_wav_on_disk: Number of frames written = 65600. + +main: load time = 324.14 ms +main: eval time = 8806.57 ms +main: total time = 9131.68 ms +``` + +Here is a video of Bark running on the iPhone: + +https://github.com/PABannier/bark.cpp/assets/12958149/bc807c0b-adfa-4c47-a05b-a2d8ba157dd8 + + +## Usage + +Here are the steps to use Bark.cpp + +### Get the code + +```bash +git clone --recursive https://github.com/PABannier/bark.cpp.git +cd bark.cpp +git submodule update --init --recursive +``` + +### Build + +In order to build bark.cpp you must use `CMake`: + +```bash +mkdir build +cd build +# To enable nvidia gpu, use the following option +# cmake -DGGML_CUBLAS=ON .. +cmake .. +cmake --build . --config Release +``` + +### Prepare data & Run + +```bash +# Install Python dependencies +python3 -m pip install -r requirements.txt + +# Download the Bark checkpoints and vocabulary +python3 download_weights.py --out-dir ./models --models bark-small bark + +# Convert the model to ggml format +python3 convert.py --dir-model ./models/bark-small --use-f16 + +# run the inference +./build/examples/main/main -m ./models/bark-small/ggml_weights.bin -p "this is an audio generated by bark.cpp" -t 4 +``` + +### (Optional) Quantize weights + +Weights can be quantized using the following strategy: `q4_0`, `q4_1`, `q5_0`, `q5_1`, `q8_0`. + +Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models. + +```bash +./build/examples/quantize/quantize ./ggml_weights.bin ./ggml_weights_q4.bin q4_0 +``` + +### Seminal papers + +- Bark + - [Text Prompted Generative Audio](https://github.com/suno-ai/bark) +- Encodec + - [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) +- GPT-3 + - [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) + +### Contributing + +`bark.cpp` is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be + +- bug report: you may encounter a bug while using `bark.cpp`. Don't hesitate to report it on the issue section. +- feature request: you want to add a new model or support a new platform. You can use the issue section to make suggestions. +- pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and a reviewer will reach out to you. + +### Coding guidelines + +- Avoid adding third-party dependencies, extra files, extra headers, etc. +- Always consider cross-compatibility with other operating systems and architectures From 51f9f05208d64cb26b55d3d03a211c0a09a62534 Mon Sep 17 00:00:00 2001 From: Khushal Jethava Date: Fri, 24 Jan 2025 17:55:00 +0530 Subject: [PATCH 2/2] feat(Updated Google colab file): --- README copy.md | 154 ------------------------------------------------- README.md | 2 +- 2 files changed, 1 insertion(+), 155 deletions(-) delete mode 100644 README copy.md diff --git a/README copy.md b/README copy.md deleted file mode 100644 index 12d3ee0..0000000 --- a/README copy.md +++ /dev/null @@ -1,154 +0,0 @@ -# bark.cpp - -![bark.cpp](./assets/banner.png) - -[![Actions Status](https://github.com/PABannier/bark.cpp/actions/workflows/build.yml/badge.svg)](https://github.com/PABannier/bark.cpp/actions) -[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT) - -[Roadmap](https://github.com/users/PABannier/projects/1) / [encodec.cpp](https://github.com/PABannier/encodec.cpp) / [ggml](https://github.com/ggerganov/ggml) - -Inference of [SunoAI's bark model](https://github.com/suno-ai/bark) in pure C/C++. - -## Description - -With `bark.cpp`, our goal is to bring **real-time realistic multilingual** text-to-speech generation to the community. - -- [x] Plain C/C++ implementation without dependencies -- [x] AVX, AVX2 and AVX512 for x86 architectures -- [x] CPU and GPU compatible backends -- [x] Mixed F16 / F32 precision -- [x] 4-bit, 5-bit and 8-bit integer quantization -- [x] Metal and CUDA backends - -**Models supported** - -- [x] [Bark Small](https://huggingface.co/suno/bark-small) -- [x] [Bark Large](https://huggingface.co/suno/bark) - -**Models we want to implement! Please open a PR :)** - -- [ ] [AudioCraft](https://audiocraft.metademolab.com/) ([#62](https://github.com/PABannier/bark.cpp/issues/62)) -- [ ] [AudioLDM2](https://audioldm.github.io/audioldm2/) ([#82](https://github.com/PABannier/bark.cpp/issues/82)) -- [ ] [Piper](https://github.com/rhasspy/piper) ([#135](https://github.com/PABannier/bark.cpp/issues/135)) - -Demo on [Google Colab](https://colab.research.google.com/drive/1j8osRVX4J_DAXMUDked7AgR9_cYGdPOz?usp=sharing) - ---- - -Here is a typical run using `bark.cpp`: - -```java -./main -p "This is an audio generated by bark.cpp" - - __ __ - / /_ ____ ______/ /__ _________ ____ - / __ \/ __ `/ ___/ //_/ / ___/ __ \/ __ \ - / /_/ / /_/ / / / ,< _ / /__/ /_/ / /_/ / -/_.___/\__,_/_/ /_/|_| (_) \___/ .___/ .___/ - /_/ /_/ - -bark_tokenize_input: prompt: 'This is an audio generated by bark.cpp' -bark_tokenize_input: number of tokens in prompt = 513, first 8 tokens: 20795 20172 20199 33733 58966 20203 28169 20222 - -Generating semantic tokens: 17% - -bark_print_statistics: sample time = 10.98 ms / 138 tokens -bark_print_statistics: predict time = 614.96 ms / 4.46 ms per token -bark_print_statistics: total time = 633.54 ms - -Generating coarse tokens: 100% - -bark_print_statistics: sample time = 3.75 ms / 410 tokens -bark_print_statistics: predict time = 3263.17 ms / 7.96 ms per token -bark_print_statistics: total time = 3274.00 ms - -Generating fine tokens: 100% - -bark_print_statistics: sample time = 38.82 ms / 6144 tokens -bark_print_statistics: predict time = 4729.86 ms / 0.77 ms per token -bark_print_statistics: total time = 4772.92 ms - -write_wav_on_disk: Number of frames written = 65600. - -main: load time = 324.14 ms -main: eval time = 8806.57 ms -main: total time = 9131.68 ms -``` - -Here is a video of Bark running on the iPhone: - -https://github.com/PABannier/bark.cpp/assets/12958149/bc807c0b-adfa-4c47-a05b-a2d8ba157dd8 - - -## Usage - -Here are the steps to use Bark.cpp - -### Get the code - -```bash -git clone --recursive https://github.com/PABannier/bark.cpp.git -cd bark.cpp -git submodule update --init --recursive -``` - -### Build - -In order to build bark.cpp you must use `CMake`: - -```bash -mkdir build -cd build -# To enable nvidia gpu, use the following option -# cmake -DGGML_CUBLAS=ON .. -cmake .. -cmake --build . --config Release -``` - -### Prepare data & Run - -```bash -# Install Python dependencies -python3 -m pip install -r requirements.txt - -# Download the Bark checkpoints and vocabulary -python3 download_weights.py --out-dir ./models --models bark-small bark - -# Convert the model to ggml format -python3 convert.py --dir-model ./models/bark-small --use-f16 - -# run the inference -./build/examples/main/main -m ./models/bark-small/ggml_weights.bin -p "this is an audio generated by bark.cpp" -t 4 -``` - -### (Optional) Quantize weights - -Weights can be quantized using the following strategy: `q4_0`, `q4_1`, `q5_0`, `q5_1`, `q8_0`. - -Note that to preserve audio quality, we do not quantize the codec model. The bulk of the computation is in the forward pass of the GPT models. - -```bash -./build/examples/quantize/quantize ./ggml_weights.bin ./ggml_weights_q4.bin q4_0 -``` - -### Seminal papers - -- Bark - - [Text Prompted Generative Audio](https://github.com/suno-ai/bark) -- Encodec - - [High Fidelity Neural Audio Compression](https://arxiv.org/abs/2210.13438) -- GPT-3 - - [Language Models are Few-Shot Learners](https://arxiv.org/abs/2005.14165) - -### Contributing - -`bark.cpp` is a continuous endeavour that relies on the community efforts to last and evolve. Your contribution is welcome and highly valuable. It can be - -- bug report: you may encounter a bug while using `bark.cpp`. Don't hesitate to report it on the issue section. -- feature request: you want to add a new model or support a new platform. You can use the issue section to make suggestions. -- pull request: you may have fixed a bug, added a features, or even fixed a small typo in the documentation, ... you can submit a pull request and a reviewer will reach out to you. - -### Coding guidelines - -- Avoid adding third-party dependencies, extra files, extra headers, etc. -- Always consider cross-compatibility with other operating systems and architectures diff --git a/README.md b/README.md index 968af1e..12d3ee0 100644 --- a/README.md +++ b/README.md @@ -31,7 +31,7 @@ With `bark.cpp`, our goal is to bring **real-time realistic multilingual** text- - [ ] [AudioLDM2](https://audioldm.github.io/audioldm2/) ([#82](https://github.com/PABannier/bark.cpp/issues/82)) - [ ] [Piper](https://github.com/rhasspy/piper) ([#135](https://github.com/PABannier/bark.cpp/issues/135)) -Demo on [Google Colab](https://colab.research.google.com/drive/1JVtJ6CDwxtKfFmEd8J4FGY2lzdL0d0jT?usp=sharing) ([#95](https://github.com/PABannier/bark.cpp/issues/95)) +Demo on [Google Colab](https://colab.research.google.com/drive/1j8osRVX4J_DAXMUDked7AgR9_cYGdPOz?usp=sharing) ---