|
--- |
|
title: WER |
|
emoji: 🤗 |
|
colorFrom: blue |
|
colorTo: red |
|
sdk: gradio |
|
sdk_version: 3.19.1 |
|
app_file: app.py |
|
pinned: false |
|
tags: |
|
- evaluate |
|
- metric |
|
description: >- |
|
Word error rate (WER) is a common metric of the performance of an automatic |
|
speech recognition system. |
|
|
|
The general difficulty of measuring performance lies in the fact that the |
|
recognized word sequence can have a different length from the reference word |
|
sequence (supposedly the correct one). The WER is derived from the Levenshtein |
|
distance, working at the word level instead of the phoneme level. The WER is a |
|
valuable tool for comparing different systems as well as for evaluating |
|
improvements within one system. This kind of measurement, however, provides no |
|
details on the nature of translation errors and further work is therefore |
|
required to identify the main source(s) of error and to focus any research |
|
effort. |
|
|
|
This problem is solved by first aligning the recognized word sequence with the |
|
reference (spoken) word sequence using dynamic string alignment. Examination |
|
of this issue is seen through a theory called the power law that states the |
|
correlation between perplexity and word error rate. |
|
|
|
Word error rate can then be computed as: |
|
|
|
WER = (S + D + I) / N = (S + D + I) / (S + D + C) |
|
|
|
where |
|
|
|
S is the number of substitutions, D is the number of deletions, I is the |
|
number of insertions, C is the number of correct words, N is the number of |
|
words in the reference (N=S+D+C). |
|
|
|
This value indicates the average number of errors per reference word. The |
|
lower the value, the better the performance of the ASR system with a WER of 0 |
|
being a perfect score. |
|
duplicated_from: evaluate-metric/wer |
|
--- |
|
|
|
# Metric Card for WER |
|
|
|
## Metric description |
|
Word error rate (WER) is a common metric of the performance of an automatic speech recognition (ASR) system. |
|
|
|
The general difficulty of measuring the performance of ASR systems lies in the fact that the recognized word sequence can have a different length from the reference word sequence (supposedly the correct one). The WER is derived from the [Levenshtein distance](https://en.wikipedia.org/wiki/Levenshtein_distance), working at the word level. |
|
|
|
This problem is solved by first aligning the recognized word sequence with the reference (spoken) word sequence using dynamic string alignment. Examination of this issue is seen through a theory called the power law that states the correlation between [perplexity](https://huggingface.co/metrics/perplexity) and word error rate (see [this article](https://www.cs.cmu.edu/~roni/papers/eval-metrics-bntuw-9802.pdf) for further information). |
|
|
|
Word error rate can then be computed as: |
|
|
|
`WER = (S + D + I) / N = (S + D + I) / (S + D + C)` |
|
|
|
where |
|
|
|
`S` is the number of substitutions, |
|
|
|
`D` is the number of deletions, |
|
|
|
`I` is the number of insertions, |
|
|
|
`C` is the number of correct words, |
|
|
|
`N` is the number of words in the reference (`N=S+D+C`). |
|
|
|
|
|
## Citation |
|
|
|
```bibtex |
|
@inproceedings{woodard1982, |
|
author = {Woodard, J.P. and Nelson, J.T., |
|
year = {1982}, |
|
journal = {Workshop on standardisation for speech I/O technology, Naval Air Development Center, Warminster, PA}, |
|
title = {An information theoretic measure of speech recognition performance} |
|
} |
|
``` |
|
|
|
```bibtex |
|
@inproceedings{morris2004, |
|
author = {Morris, Andrew and Maier, Viktoria and Green, Phil}, |
|
year = {2004}, |
|
month = {01}, |
|
pages = {}, |
|
title = {From WER and RIL to MER and WIL: improved evaluation measures for connected speech recognition.} |
|
} |
|
``` |
|
|
|
## Further References |
|
|
|
- [Word Error Rate -- Wikipedia](https://en.wikipedia.org/wiki/Word_error_rate) |
|
- [Hugging Face Tasks -- Automatic Speech Recognition](https://huggingface.co/tasks/automatic-speech-recognition) |
|
|