🐶 Bark¶
Bark is a multi-lingual TTS model created by Suno-AI. It can generate conversational speech as well as music and sound effects. It is architecturally very similar to Google’s AudioLM. For more information, please refer to the Suno-AI’s repo.
Acknowledgements¶
👑Suno-AI for training and open-sourcing this model.
👑gitmylo for finding the solution to the semantic token generation for voice clones and finetunes.
👑serp-ai for controlled voice cloning.
Example use¶
See also
text = "Hello, my name is Manmay , how are you?"
from TTS.tts.configs.bark_config import BarkConfig
from TTS.tts.models.bark import Bark
config = BarkConfig()
model = Bark.init_from_config(config)
model.load_checkpoint(config, checkpoint_dir="path/to/model/dir/", eval=True)
# Random speaker
output_dict = model.synthesize(text)
# Cloning a speaker.
output_dict = model.synthesize(text, speaker_wav="path/to/speaker.wav")
Using 🐸TTS API:
from TTS.api import TTS
# Load the model to GPU
# Bark is really slow on CPU, so we recommend using GPU.
tts = TTS("tts_models/multilingual/multi-dataset/bark").to("cuda")
# Clone voice and cache it with the custom ID `ljspeech`.
tts.tts_to_file(text="Hello, my name is Manmay , how are you?",
file_path="output.wav",
speaker_wav=["tests/data/ljspeech/wavs/LJ001-0001.wav"],
speaker="ljspeech")
# When you run it again it uses the stored values to generate the voice.
tts.tts_to_file(text="Hello, my name is Manmay , how are you?",
file_path="output.wav",
speaker="ljspeech")
# random speaker
tts = TTS("tts_models/multilingual/multi-dataset/bark").to("cuda")
tts.tts_to_file("hello world", file_path="out.wav")
Using 🐸TTS Command line:
# Clone the `ljspeech` voice and cache it under that ID for later reuse without reference audio.
tts --model_name tts_models/multilingual/multi-dataset/bark \
--text "This is an example." \
--out_path "output.wav" \
--speaker_wav tests/data/ljspeech/wavs/*.wav
--speaker_idx "ljspeech"
# Random voice generation
tts --model_name tts_models/multilingual/multi-dataset/bark \
--text "This is an example." \
--out_path "output.wav"
Note
The authors of the Bark model provide a range of preset
voices
in .npz format that you can place into the voice_dir and then use in the
speaker argument.
Important resources & papers¶
Original Repo: https://github.com/suno-ai/bark
Cloning implementation: https://github.com/serp-ai/bark-with-voice-clone
AudioLM: https://arxiv.org/abs/2209.03143