Toward Zero Oracle Word Error Rate on the Switchboard Benchmark
- URL: http://arxiv.org/abs/2206.06192v1
- Date: Mon, 13 Jun 2022 14:26:40 GMT
- Title: Toward Zero Oracle Word Error Rate on the Switchboard Benchmark
- Authors: Arlo Faria, Adam Janin, Korbinian Riedhammer, Sidhi Adkoli
- Abstract summary: The "Switchboard benchmark" is a very well-known test set in automatic speech recognition (ASR) research.
This work highlights lesser-known practical considerations of this evaluation, demonstrating major improvements in word error rate (WER)
Even commercial ASR systems can score below 5% WER and the established record for a research system is lowered to 2.3%.
- Score: 0.3297645391680979
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The "Switchboard benchmark" is a very well-known test set in automatic speech
recognition (ASR) research, establishing record-setting performance for systems
that claim human-level transcription accuracy. This work highlights
lesser-known practical considerations of this evaluation, demonstrating major
improvements in word error rate (WER) by correcting the reference
transcriptions and deviating from the official scoring methodology. In this
more detailed and reproducible scheme, even commercial ASR systems can score
below 5\% WER and the established record for a research system is lowered to
2.3%. An alternative metric of transcript precision is proposed, which does not
penalize deletions and appears to be more discriminating for human vs. machine
performance. While commercial ASR systems are still below this threshold, a
research system is shown to clearly surpass the accuracy of commercial human
speech recognition. This work also explores using standardized scoring tools to
compute oracle WER by selecting the best among a list of alternatives. A phrase
alternatives representation is compared to utterance-level N-best lists and
word-level data structures; using dense lattices and adding out-of-vocabulary
words, this achieves an oracle WER of 0.18%.
Related papers
- HyPoradise: An Open Baseline for Generative Speech Recognition with
Large Language Models [81.56455625624041]
We introduce the first open-source benchmark to utilize external large language models (LLMs) for ASR error correction.
The proposed benchmark contains a novel dataset, HyPoradise (HP), encompassing more than 334,000 pairs of N-best hypotheses.
LLMs with reasonable prompt and its generative capability can even correct those tokens that are missing in N-best list.
arXiv Detail & Related papers (2023-09-27T14:44:10Z) - SpellMapper: A non-autoregressive neural spellchecker for ASR
customization with candidate retrieval based on n-gram mappings [76.87664008338317]
Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition.
We propose a novel algorithm for candidate retrieval based on misspelled n-gram mappings.
Experiments on Spoken Wikipedia show 21.4% word error rate improvement compared to a baseline ASR system.
arXiv Detail & Related papers (2023-06-04T10:00:12Z) - End-to-End Page-Level Assessment of Handwritten Text Recognition [69.55992406968495]
HTR systems increasingly face the end-to-end page-level transcription of a document.
Standard metrics do not take into account the inconsistencies that might appear.
We propose a two-fold evaluation, where the transcription accuracy and the RO goodness are considered separately.
arXiv Detail & Related papers (2023-01-14T15:43:07Z) - H_eval: A new hybrid evaluation metric for automatic speech recognition
tasks [0.3277163122167433]
We propose H_eval, a new hybrid evaluation metric for ASR systems.
It considers both semantic correctness and error rate and performs significantly well in scenarios where WER and SD perform poorly.
arXiv Detail & Related papers (2022-11-03T11:23:36Z) - Improving Distinction between ASR Errors and Speech Disfluencies with
Feature Space Interpolation [0.0]
Fine-tuning pretrained language models (LMs) is a popular approach to automatic speech recognition (ASR) error detection during post-processing.
This paper proposes a scheme to improve existing LM-based ASR error detection systems.
arXiv Detail & Related papers (2021-08-04T02:11:37Z) - Semantic-WER: A Unified Metric for the Evaluation of ASR Transcript for
End Usability [1.599072005190786]
State-of-the-art systems have achieved a word error rate (WER) less than 5%.
Semantic-WER (SWER) is a metric to evaluate the ASR transcripts for downstream applications in general.
arXiv Detail & Related papers (2021-06-03T17:35:14Z) - Dynamic Acoustic Unit Augmentation With BPE-Dropout for Low-Resource
End-to-End Speech Recognition [62.94773371761236]
We consider building an effective end-to-end ASR system in low-resource setups with a high OOV rate.
We propose a method of dynamic acoustic unit augmentation based on the BPE-dropout technique.
Our monolingual Turkish Conformer established a competitive result with 22.2% character error rate (CER) and 38.9% word error rate (WER)
arXiv Detail & Related papers (2021-03-12T10:10:13Z) - NUVA: A Naming Utterance Verifier for Aphasia Treatment [49.114436579008476]
Assessment of speech performance using picture naming tasks is a key method for both diagnosis and monitoring of responses to treatment interventions by people with aphasia (PWA)
Here we present NUVA, an utterance verification system incorporating a deep learning element that classifies 'correct' versus'incorrect' naming attempts from aphasic stroke patients.
When tested on eight native British-English speaking PWA the system's performance accuracy ranged between 83.6% to 93.6%, with a 10-fold cross-validation mean of 89.5%.
arXiv Detail & Related papers (2021-02-10T13:00:29Z) - WER-BERT: Automatic WER Estimation with BERT in a Balanced Ordinal
Classification Paradigm [0.0]
We propose a new balanced paradigm for e-WER in a classification setting.
Within this paradigm, we also propose WER-BERT, a BERT based architecture with speech features for e-WER.
The results and experiments demonstrate that WER-BERT establishes a new state-of-the-art in automatic WER estimation.
arXiv Detail & Related papers (2021-01-14T07:26:28Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.