dalmeow commited on
Commit
e010778
1 Parent(s): 5c80035

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +0 -72
README.md CHANGED
@@ -71,78 +71,6 @@ where
71
  `N` is the number of words in the reference (`N=S+D+C`).
72
 
73
 
74
- ## How to use
75
-
76
- The metric takes two inputs: references (a list of references for each speech input) and predictions (a list of transcriptions to score).
77
-
78
-
79
- ```python
80
- from evaluate import load
81
- wer = load("wer")
82
- wer_score = wer.compute(predictions=predictions, references=references)
83
- ```
84
- ## Output values
85
-
86
- This metric outputs a float representing the word error rate.
87
-
88
- ```
89
- print(wer_score)
90
- 0.5
91
- ```
92
-
93
- This value indicates the average number of errors per reference word.
94
-
95
- The **lower** the value, the **better** the performance of the ASR system, with a WER of 0 being a perfect score.
96
-
97
- ### Values from popular papers
98
-
99
- This metric is highly dependent on the content and quality of the dataset, and therefore users can expect very different values for the same model but on different datasets.
100
-
101
- For example, datasets such as [LibriSpeech](https://huggingface.co/datasets/librispeech_asr) report a WER in the 1.8-3.3 range, whereas ASR models evaluated on [Timit](https://huggingface.co/datasets/timit_asr) report a WER in the 8.3-20.4 range.
102
- See the leaderboards for [LibriSpeech](https://paperswithcode.com/sota/speech-recognition-on-librispeech-test-clean) and [Timit](https://paperswithcode.com/sota/speech-recognition-on-timit) for the most recent values.
103
-
104
- ## Examples
105
-
106
- Perfect match between prediction and reference:
107
-
108
- ```python
109
- from evaluate import load
110
- wer = load("wer")
111
- predictions = ["hello world", "good night moon"]
112
- references = ["hello world", "good night moon"]
113
- wer_score = wer.compute(predictions=predictions, references=references)
114
- print(wer_score)
115
- 0.0
116
- ```
117
-
118
- Partial match between prediction and reference:
119
-
120
- ```python
121
- from evaluate import load
122
- wer = load("wer")
123
- predictions = ["this is the prediction", "there is an other sample"]
124
- references = ["this is the reference", "there is another one"]
125
- wer_score = wer.compute(predictions=predictions, references=references)
126
- print(wer_score)
127
- 0.5
128
- ```
129
-
130
- No match between prediction and reference:
131
-
132
- ```python
133
- from evaluate import load
134
- wer = load("wer")
135
- predictions = ["hello world", "good night moon"]
136
- references = ["hi everyone", "have a great day"]
137
- wer_score = wer.compute(predictions=predictions, references=references)
138
- print(wer_score)
139
- 1.0
140
- ```
141
-
142
- ## Limitations and bias
143
-
144
- WER is a valuable tool for comparing different systems as well as for evaluating improvements within one system. This kind of measurement, however, provides no details on the nature of translation errors and further work is therefore required to identify the main source(s) of error and to focus any research effort.
145
-
146
  ## Citation
147
 
148
  ```bibtex
 
71
  `N` is the number of words in the reference (`N=S+D+C`).
72
 
73
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
74
  ## Citation
75
 
76
  ```bibtex