Bark是由Suno创建的一个基于转换器的文本到音频模型。Bark可以生成高度逼真的多语言语音以及其他音频,包括音乐、背景噪音和简单的音效。
该模型还可以产生非语言交流,如大笑、叹息和哭泣。为了支持研究社区,我们正在提供对预先训练的模型检查点的访问,以便进行推理。
使用示例:
from bark import SAMPLE_RATE, generate_audio, preload_models from scipy.io.wavfile import write as write_wav from IPython.display import Audio # download and load all models preload_models() # generate audio from text text_prompt = """ Hello, my name is Suno. And, uh — and I like pizza. [laughs] But I also have other interests such as playing tic tac toe. """ audio_array = generate_audio(text_prompt) # save audio to disk write_wav("bark_generation.wav", SAMPLE_RATE, audio_array) # play text in notebook Audio(audio_array, rate=SAMPLE_RATE)