# 🐶 Bark Bark is a multi-lingual TTS model created by [Suno-AI](https://www.suno.ai/). It can generate conversational speech as well as music and sound effects. It is architecturally very similar to Google's [AudioLM](https://arxiv.org/abs/2209.03143). For more information, please refer to the [Suno-AI's repo](https://github.com/suno-ai/bark). ## Acknowledgements - 👑[Suno-AI](https://www.suno.ai/) for training and open-sourcing this model. - 👑[gitmylo](https://github.com/gitmylo) for finding [the solution](https://github.com/gitmylo/bark-voice-cloning-HuBERT-quantizer/) to the semantic token generation for voice clones and finetunes. - 👑[serp-ai](https://github.com/serp-ai/bark-with-voice-clone) for controlled voice cloning. ## Example use ```{seealso} [Voice cloning](../cloning.md) ``` ```python text = "Hello, my name is Manmay , how are you?" from TTS.tts.configs.bark_config import BarkConfig from TTS.tts.models.bark import Bark config = BarkConfig() model = Bark.init_from_config(config) model.load_checkpoint(config, checkpoint_dir="path/to/model/dir/", eval=True) # Random speaker output_dict = model.synthesize(text) # Cloning a speaker. output_dict = model.synthesize(text, speaker_wav="path/to/speaker.wav") ``` Using 🐸TTS API: ```python from TTS.api import TTS # Load the model to GPU # Bark is really slow on CPU, so we recommend using GPU. tts = TTS("tts_models/multilingual/multi-dataset/bark").to("cuda") # Clone voice and cache it with the custom ID `ljspeech`. tts.tts_to_file(text="Hello, my name is Manmay , how are you?", file_path="output.wav", speaker_wav=["tests/data/ljspeech/wavs/LJ001-0001.wav"], speaker="ljspeech") # When you run it again it uses the stored values to generate the voice. tts.tts_to_file(text="Hello, my name is Manmay , how are you?", file_path="output.wav", speaker="ljspeech") # random speaker tts = TTS("tts_models/multilingual/multi-dataset/bark").to("cuda") tts.tts_to_file("hello world", file_path="out.wav") ``` Using 🐸TTS Command line: ```console # Clone the `ljspeech` voice and cache it under that ID for later reuse without reference audio. tts --model_name tts_models/multilingual/multi-dataset/bark \ --text "This is an example." \ --out_path "output.wav" \ --speaker_wav tests/data/ljspeech/wavs/*.wav --speaker_idx "ljspeech" # Random voice generation tts --model_name tts_models/multilingual/multi-dataset/bark \ --text "This is an example." \ --out_path "output.wav" ``` ```{note} The authors of the Bark model provide a range of [preset voices](https://suno-ai.notion.site/8b8e8749ed514b0cbf3f699013548683?v=bc67cff786b04b50b3ceb756fd05f68c) in `.npz` format that you can place into the `voice_dir` and then use in the `speaker` argument. ``` ## Important resources & papers - Original Repo: https://github.com/suno-ai/bark - Cloning implementation: https://github.com/serp-ai/bark-with-voice-clone - AudioLM: https://arxiv.org/abs/2209.03143 ## BarkConfig ```{eval-rst} .. autoclass:: TTS.tts.configs.bark_config.BarkConfig :members: ``` ## Bark Model ```{eval-rst} .. autoclass:: TTS.tts.models.bark.Bark :members: ```