Vaibhav Srivastav commited on
Commit
2f66b4a
1 Parent(s): 43dea64

add model card

Browse files
Files changed (1) hide show
  1. README.md +41 -0
README.md ADDED
@@ -0,0 +1,41 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # Model Card: Bark
2
+
3
+ This is the official codebase for running the text to audio model, from Suno.ai.
4
+
5
+ The following is additional information about the models released here.
6
+
7
+ ## Model Details
8
+
9
+ Bark is a series of three transformer models that turn text into audio.
10
+
11
+ ### Text to semantic tokens
12
+ - Input: text, tokenized with [BERT tokenizer from Hugging Face](https://huggingface.co/docs/transformers/model_doc/bert#transformers.BertTokenizer)
13
+ - Output: semantic tokens that encode the audio to be generated
14
+
15
+ ### Semantic to coarse tokens
16
+ - Input: semantic tokens
17
+ - Output: tokens from the first two codebooks of the [EnCodec Codec](https://github.com/facebookresearch/encodec) from facebook
18
+
19
+ ### Coarse to fine tokens
20
+ - Input: the first two codebooks from EnCodec
21
+ - Output: 8 codebooks from EnCodec
22
+
23
+ ### Architecture
24
+ | Model | Parameters | Attention | Output Vocab size |
25
+ |:-------------------------:|:----------:|------------|:-----------------:|
26
+ | Text to semantic tokens | 80 M | Causal | 10,000 |
27
+ | Semantic to coarse tokens | 80 M | Causal | 2x 1,024 |
28
+ | Coarse to fine tokens | 80 M | Non-causal | 6x 1,024 |
29
+
30
+
31
+ ### Release date
32
+ April 2023
33
+
34
+ ## Broader Implications
35
+ We anticipate that this model's text to audio capabilities can be used to improve accessbility tools in a variety of languages.
36
+ Straightforward improvements will allow models to run faster than realtime, rendering them useful for applications such as virtual assistants.
37
+
38
+ While we hope that this release will enable users to express their creativity and build applications that are a force
39
+ for good, we acknowledge that any text to audio model has the potential for dual use. While it is not straightforward
40
+ to voice clone known people with Bark, they can still be used for nefarious purposes. To further reduce the chances of unintended use of Bark,
41
+ we also release a simple classifier to detect Bark-generated audio with high accuracy (see notebooks section of the main repository).