shuaijiang commited on
Commit
ce7c2ee
β€’
1 Parent(s): 1bb39ae

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +70 -3
README.md CHANGED
@@ -1,3 +1,70 @@
1
- ---
2
- license: apache-2.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ metrics:
4
+ - cer
5
+ ---
6
+
7
+ ## Welcome
8
+ If you find this model helpful, please *like* this model and star us on https://github.com/LianjiaTech/BELLE and https://github.com/shuaijiang/Whisper-Finetune
9
+
10
+ # Belle-whisper-large-v3-turbo-zh
11
+ Fine tune whisper-large-v3-turbo-zh to enhance Chinese speech recognition capabilities,
12
+ Belle-whisper-large-v3-zh-punct demonstrates a x-y% relative improvement in performance to whisper-large-v3-turbo on Chinese ASR benchmarks, including AISHELL1, AISHELL2, WENETSPEECH, and HKUST.
13
+
14
+ Same to Belle-whisper-large-v3-zh-punct, the punctuation marks come from model [punc_ct-transformer_cn-en-common-vocab471067-large](https://www.modelscope.cn/models/iic/punc_ct-transformer_cn-en-common-vocab471067-large/),
15
+ and are added to the training datasets.
16
+
17
+ ## Usage
18
+ ```python
19
+
20
+ from transformers import pipeline
21
+
22
+ transcriber = pipeline(
23
+ "automatic-speech-recognition",
24
+ model="BELLE-2/Belle-whisper-large-v3-turbo-zh"
25
+ )
26
+
27
+ transcriber.model.config.forced_decoder_ids = (
28
+ transcriber.tokenizer.get_decoder_prompt_ids(
29
+ language="zh",
30
+ task="transcribe"
31
+ )
32
+ )
33
+
34
+ transcription = transcriber("my_audio.wav")
35
+
36
+ ```
37
+
38
+ ## Fine-tuning
39
+ | Model | (Re)Sample Rate | Train Datasets | Fine-tuning (full or peft) |
40
+ |:----------------:|:-------:|:----------------------------------------------------------:|:-----------:|
41
+ | Belle-whisper-large-v3-turbo-zh | 16KHz | [AISHELL-1](https://openslr.magicdatatech.com/resources/33/) [AISHELL-2](https://www.aishelltech.com/aishell_2) [WenetSpeech](https://wenet.org.cn/WenetSpeech/) [HKUST](https://catalog.ldc.upenn.edu/LDC2005S15) | [lora fine-tuning](https://github.com/shuaijiang/Whisper-Finetune) |
42
+
43
+ To incorporate punctuation marks without compromising performance, Lora fine-tuning was employed.
44
+ If you want to fine-thuning the model on your datasets, please reference to the [github repo](https://github.com/shuaijiang/Whisper-Finetune)
45
+
46
+
47
+ ## CER(%) ↓
48
+ | Model | Language Tag | aishell_1_test(↓) |aishell_2_test(↓)| wenetspeech_net(↓) | wenetspeech_meeting(↓) | HKUST_dev(↓)|
49
+ |:----------------:|:-------:|:-----------:|:-----------:|:--------:|:-----------:|:-------:|
50
+ | whisper-large-v3 | Chinese | 8.085 | 5.475 | 11.72 | 20.15 | 28.597 |
51
+ | whisper-large-v3-turbo | Chinese | 8.639 | 6.014 | 13.507 | 20.313 | 37.324 |
52
+ | Belle-whisper-large-v3-turbo-zh | Chinese | 2.x | 3.x | 8.x | 11.x | 16.x |
53
+
54
+ It is worth mentioning that compared to Belle-whisper-large-v3-zh, Belle-whisper-large-v3-zh-punct even has a slight improvement in complex acoustic scenes(such as wenetspeech_meeting).
55
+ And the punctation marks of Belle-whisper-large-v3-zh-punct are removed to compute the CER.
56
+
57
+ ## Citation
58
+
59
+ Please cite our paper and github when using our code, data or model.
60
+
61
+ ```
62
+ @misc{BELLE,
63
+ author = {BELLEGroup},
64
+ title = {BELLE: Be Everyone's Large Language model Engine},
65
+ year = {2023},
66
+ publisher = {GitHub},
67
+ journal = {GitHub repository},
68
+ howpublished = {\url{https://github.com/LianjiaTech/BELLE}},
69
+ }
70
+ ```