novateur commited on
Commit
1c26931
β€’
1 Parent(s): c3769a8

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +149 -1
README.md CHANGED
@@ -7,4 +7,152 @@ tags:
7
  - gpt4-o
8
  - tokenizer
9
  - codec-representation
10
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  - gpt4-o
8
  - tokenizer
9
  - codec-representation
10
+ ---
11
+ # WavTokenizer
12
+ SOTA Discrete Codec Models With Forty Tokens Per Second for Audio Language Modeling
13
+
14
+
15
+
16
+ [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://github.com/jishengpeng/wavtokenizer)
17
+ [![demo](https://img.shields.io/badge/WanTokenizer-Demo-red)](https://wavtokenizer.github.io/)
18
+ [![model](https://img.shields.io/badge/%F0%9F%A4%97%20WavTokenizer-Models-blue)](https://github.com/jishengpeng/wavtokenizer)
19
+
20
+
21
+
22
+ ### πŸŽ‰πŸŽ‰ with WavTokenizer, you can represent speech, music, and audio with only 40 tokens one second!
23
+ ### πŸŽ‰πŸŽ‰ with WavTokenizer, You can get strong reconstruction results.
24
+ ### πŸŽ‰πŸŽ‰ WavTokenizer owns rich semantic information and is build for audio language models such as GPT4-o.
25
+
26
+ # πŸ”₯ News
27
+ - *2024.08*: We release WavTokenizer on arxiv.
28
+
29
+ ![result](result.png)
30
+
31
+
32
+ ## Installation
33
+
34
+ To use WavTokenizer, install it using:
35
+
36
+ ```bash
37
+ conda create -n wavtokenizer python=3.9
38
+ conda activate wavtokenizer
39
+ pip install -r requirements.txt
40
+ ```
41
+
42
+ ## Infer
43
+
44
+ ### Part1: Reconstruct audio from raw wav
45
+
46
+ ```python
47
+
48
+ from encoder.utils import convert_audio
49
+ import torchaudio
50
+ import torch
51
+ from decoder.pretrained import WavTokenizer
52
+
53
+
54
+ device=torch.device('cpu')
55
+
56
+ config_path = "./configs/xxx.yaml"
57
+ model_path = "./xxx.ckpt"
58
+ audio_outpath = "xxx"
59
+
60
+ wavtokenizer = WavTokenizer.from_pretrained0802(config_path, model_path)
61
+ wavtokenizer = wavtokenizer.to(device)
62
+
63
+
64
+ wav, sr = torchaudio.load(audio_path)
65
+ wav = convert_audio(wav, sr, 24000, 1)
66
+ bandwidth_id = torch.tensor([0])
67
+ wav=wav.to(device)
68
+ features,discrete_code= wavtokenizer.encode_infer(wav, bandwidth_id=bandwidth_id)
69
+ audio_out = wavtokenizer.decode(features, bandwidth_id=bandwidth_id)
70
+ torchaudio.save(audio_outpath, audio_out, sample_rate=24000, encoding='PCM_S', bits_per_sample=16)
71
+ ```
72
+
73
+
74
+ ### Part2: Generating discrete codecs
75
+ ```python
76
+
77
+ from encoder.utils import convert_audio
78
+ import torchaudio
79
+ import torch
80
+ from decoder.pretrained import WavTokenizer
81
+
82
+ device=torch.device('cpu')
83
+
84
+ config_path = "./configs/xxx.yaml"
85
+ model_path = "./xxx.ckpt"
86
+
87
+ wavtokenizer = WavTokenizer.from_pretrained0802(config_path, model_path)
88
+ wavtokenizer = wavtokenizer.to(device)
89
+
90
+ wav, sr = torchaudio.load(audio_path)
91
+ wav = convert_audio(wav, sr, 24000, 1)
92
+ bandwidth_id = torch.tensor([0])
93
+ wav=wav.to(device)
94
+ _,discrete_code= wavtokenizer.encode_infer(wav, bandwidth_id=bandwidth_id)
95
+ print(discrete_code)
96
+ ```
97
+
98
+
99
+
100
+ ### Part3: Audio reconstruction through codecs
101
+ ```python
102
+ # audio_tokens [n_q,1,t]/[n_q,t]
103
+ features = wavtokenizer.codes_to_features(audio_tokens)
104
+ bandwidth_id = torch.tensor([0])
105
+ audio_out = wavtokenizer.decode(features, bandwidth_id=bandwidth_id)
106
+ ```
107
+
108
+ ## Available models
109
+ πŸ€— links to the Huggingface model hub.
110
+
111
+ | Model name | HuggingFace | Corpus | aa | Parameters | Open-Source |
112
+ |:--------------------------------------------------------------------|:------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------:|:--------:|:---------:|:----------:|:------:|
113
+ | WavTokenizer-small-600-24k-4096 | [πŸ€—](https://github.com/jishengpeng/wavtokenizer) | LibriTTS | 40 | Speech | √ |
114
+ | WavTokenizer-small-320-24k-4096 | [πŸ€—](https://github.com/jishengpeng/wavtokenizer) | LibriTTS | 75 | Speech | √|
115
+ | WavTokenizer-medium-600-24k-4096 | [πŸ€—](https://github.com/jishengpeng/wavtokenizer) | 10000 Hours | 40 | Speech, Audio, Music | Coming Soon|
116
+ | WavTokenizer-medium-320-24k-4096 | [πŸ€—](https://github.com/jishengpeng/wavtokenizer) | 10000 Hours | 75 | Speech, Audio, Music | Coming Soon|
117
+ | WavTokenizer-large-600-24k-4096 | [πŸ€—](https://github.com/jishengpeng/wavtokenizer) | LibriTTS | 40 | Speech, Audio, Music | Coming Soon|
118
+ | WavTokenizer-large-320-24k-4096 | [πŸ€—](https://github.com/jishengpeng/wavtokenizer) | 80000 Hours | 75 | Speech, Audio, Music | Comming Soon |
119
+
120
+
121
+
122
+ ## Training
123
+
124
+ ### Step1: Prepare train dataset
125
+ ```python
126
+ # Process the data into a form similar to ./data/demo.txt
127
+ ```
128
+
129
+ ### Step2: Modifying configuration files
130
+ ```python
131
+ # ./configs/xxx.yaml
132
+ # Modify the values of parameters such as batch_size, filelist_path, save_dir, device
133
+ ```
134
+
135
+ ### Step3: Start training process
136
+ Refer to [Pytorch Lightning documentation](https://lightning.ai/docs/pytorch/stable/) for details about customizing the
137
+ training pipeline.
138
+
139
+ ```bash
140
+ cd ./WavTokenizer
141
+ python train.py fit --config ./configs/xxx.yaml
142
+ ```
143
+
144
+
145
+ ## Citation
146
+
147
+ If this code contributes to your research, please cite our work, Language-Codec and WavTokenizer:
148
+
149
+ ```
150
+ @misc{ji2024languagecodec,
151
+ title={Language-Codec: Reducing the Gaps Between Discrete Codec Representation and Speech Language Models},
152
+ author={Shengpeng Ji and Minghui Fang and Ziyue Jiang and Rongjie Huang and Jialung Zuo and Shulei Wang and Zhou Zhao},
153
+ year={2024},
154
+ eprint={2402.12208},
155
+ archivePrefix={arXiv},
156
+ primaryClass={eess.AS}
157
+ }
158
+ ```