ddd commited on
Commit
dbb6dab
1 Parent(s): b93970c

Add application file

Browse files
Files changed (1) hide show
  1. README.md +9 -83
README.md CHANGED
@@ -1,83 +1,9 @@
1
- # DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
2
- [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2105.02446)
3
- [![GitHub Stars](https://img.shields.io/github/stars/MoonInTheRiver/DiffSinger?style=social)](https://github.com/MoonInTheRiver/DiffSinger)
4
- [![downloads](https://img.shields.io/github/downloads/MoonInTheRiver/DiffSinger/total.svg)](https://github.com/MoonInTheRiver/DiffSinger/releases)
5
- | [![Hugging Face](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-blue)](https://huggingface.co/spaces/NATSpeech/DiffSpeech)
6
-
7
- This repository is the official PyTorch implementation of our AAAI-2022 [paper](https://arxiv.org/abs/2105.02446), in which we propose DiffSinger (for Singing-Voice-Synthesis) and DiffSpeech (for Text-to-Speech).
8
-
9
- <table style="width:100%">
10
- <tr>
11
- <th>DiffSinger/DiffSpeech at training</th>
12
- <th>DiffSinger/DiffSpeech at inference</th>
13
- </tr>
14
- <tr>
15
- <td><img src="resources/model_a.png" alt="Training" height="300"></td>
16
- <td><img src="resources/model_b.png" alt="Inference" height="300"></td>
17
- </tr>
18
- </table>
19
-
20
- :tada: :tada: :tada: **Updates**:
21
- - Mar.2, 2022: [MIDI-new-version](docs/README-SVS-opencpop-e2e.md): A substantial improvement :sparkles:
22
- - Mar.1, 2022: [NeuralSVB](https://github.com/MoonInTheRiver/NeuralSVB), for singing voice beautifying, has been released :sparkles: :sparkles: :sparkles: .
23
- - Feb.13, 2022: [NATSpeech](https://github.com/NATSpeech/NATSpeech), the improved code framework, which contains the implementations of DiffSpeech and our NeurIPS-2021 work [PortaSpeech](https://openreview.net/forum?id=xmJsuh8xlq) has been released :sparkles: :sparkles: :sparkles:.
24
- - Jan.29, 2022: support [MIDI-old-version](docs/README-SVS-opencpop-cascade.md) SVS. :construction: :pick: :hammer_and_wrench:
25
- - Jan.13, 2022: support SVS, release PopCS dataset.
26
- - Dec.19, 2021: support TTS. [HuggingFace🤗 Demo](https://huggingface.co/spaces/NATSpeech/DiffSpeech)
27
-
28
- :rocket: **News**:
29
- - Feb.24, 2022: Our new work, NeuralSVB was accepted by ACL-2022 [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2202.13277). [Demo Page](https://neuralsvb.github.io).
30
- - Dec.01, 2021: DiffSinger was accepted by AAAI-2022.
31
- - Sep.29, 2021: Our recent work `PortaSpeech: Portable and High-Quality Generative Text-to-Speech` was accepted by NeurIPS-2021 [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2109.15166) .
32
- - May.06, 2021: We submitted DiffSinger to Arxiv [![arXiv](https://img.shields.io/badge/arXiv-Paper-<COLOR>.svg)](https://arxiv.org/abs/2105.02446).
33
-
34
- ## Environments
35
- ```sh
36
- conda create -n your_env_name python=3.8
37
- source activate your_env_name
38
- pip install -r requirements_2080.txt (GPU 2080Ti, CUDA 10.2)
39
- or pip install -r requirements_3090.txt (GPU 3090, CUDA 11.4)
40
- ```
41
-
42
- ## Documents
43
- - [Run DiffSpeech (TTS version)](docs/README-TTS.md).
44
- - [Run DiffSinger (SVS version)](docs/README-SVS.md).
45
-
46
- ## Tensorboard
47
- ```sh
48
- tensorboard --logdir_spec exp_name
49
- ```
50
- <table style="width:100%">
51
- <tr>
52
- <td><img src="resources/tfb.png" alt="Tensorboard" height="250"></td>
53
- </tr>
54
- </table>
55
-
56
- ## Audio Demos
57
- Old audio samples can be found in our [demo page](https://diffsinger.github.io/). Audio samples generated by this repository are listed here:
58
-
59
- ### TTS audio samples
60
- Speech samples (test set of LJSpeech) can be found in [resources/demos_1213](https://github.com/MoonInTheRiver/DiffSinger/blob/master/resources/demos_1213).
61
-
62
- ### SVS audio samples
63
- Singing samples (test set of PopCS) can be found in [resources/demos_0112](https://github.com/MoonInTheRiver/DiffSinger/blob/master/resources/demos_0112).
64
-
65
- ## Citation
66
- @article{liu2021diffsinger,
67
- title={Diffsinger: Singing voice synthesis via shallow diffusion mechanism},
68
- author={Liu, Jinglin and Li, Chengxi and Ren, Yi and Chen, Feiyang and Liu, Peng and Zhao, Zhou},
69
- journal={arXiv preprint arXiv:2105.02446},
70
- volume={2},
71
- year={2021}}
72
-
73
-
74
- ## Acknowledgements
75
- Our codes are based on the following repos:
76
- * [denoising-diffusion-pytorch](https://github.com/lucidrains/denoising-diffusion-pytorch)
77
- * [PyTorch Lightning](https://github.com/PyTorchLightning/pytorch-lightning)
78
- * [ParallelWaveGAN](https://github.com/kan-bayashi/ParallelWaveGAN)
79
- * [HifiGAN](https://github.com/jik876/hifi-gan)
80
- * [espnet](https://github.com/espnet/espnet)
81
- * [DiffWave](https://github.com/lmnt-com/diffwave)
82
-
83
- Also thanks [Keon Lee](https://github.com/keonlee9420/DiffSinger) for fast implementation of our work.
 
1
+ ---
2
+ title: DiffSinger
3
+ emoji: 🤗
4
+ colorFrom: yellow
5
+ colorTo: orange
6
+ sdk: gradio
7
+ app_file: "inference/svs/gradio/infer.py"
8
+ pinned: false
9
+ ---