Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces
- URL: http://arxiv.org/abs/2503.15124v1
- Date: Wed, 19 Mar 2025 11:33:40 GMT
- Title: Evaluating ASR Confidence Scores for Automated Error Detection in User-Assisted Correction Interfaces
- Authors: Korbinian Kuhn, Verena Kersken, Gottfried Zimmermann,
- Abstract summary: This study evaluates the reliability of confidence scores for error detection through a comprehensive analysis of end-to-end ASR models.<n>Results show that while confidence scores correlate with transcription accuracy, their error detection performance is limited.<n>These findings highlight the limitations of confidence scores and the need for more sophisticated approaches to improve user interaction and explainability of ASR results.
- Score: 5.266869303483375
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Despite advances in Automatic Speech Recognition (ASR), transcription errors persist and require manual correction. Confidence scores, which indicate the certainty of ASR results, could assist users in identifying and correcting errors. This study evaluates the reliability of confidence scores for error detection through a comprehensive analysis of end-to-end ASR models and a user study with 36 participants. The results show that while confidence scores correlate with transcription accuracy, their error detection performance is limited. Classifiers frequently miss errors or generate many false positives, undermining their practical utility. Confidence-based error detection neither improved correction efficiency nor was perceived as helpful by participants. These findings highlight the limitations of confidence scores and the need for more sophisticated approaches to improve user interaction and explainability of ASR results.
Related papers
- Not All Errors Are Equal: Investigation of Speech Recognition Errors in Alzheimer's Disease Detection [62.942077348224046]
Speech recognition plays an important role in automatic detection of Alzheimer's disease (AD)<n>Recent studies have revealed a non-linear relationship between word error rates (WER) and AD detection performance.<n>This work presents a series of analyses to explore the effect of ASR transcription errors in BERT-based AD detection systems.
arXiv Detail & Related papers (2024-12-09T09:32:20Z) - Fact-Level Confidence Calibration and Self-Correction [64.40105513819272]
We propose a Fact-Level framework that calibrates confidence to relevance-weighted correctness at the fact level.
We also develop Confidence-Guided Fact-level Self-Correction ($textbfConFix$), which uses high-confidence facts within a response as additional knowledge to improve low-confidence ones.
arXiv Detail & Related papers (2024-11-20T14:15:18Z) - Confidence-Aware Document OCR Error Detection [1.003485566379789]
We analyze the correlation between confidence scores and error rates across different OCR systems.
We develop ConfBERT, a BERT-based model that incorporates OCR confidence scores into token embeddings.
arXiv Detail & Related papers (2024-09-06T08:35:28Z) - Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition [52.624909026294105]
We propose a non-autoregressive speech error correction method.
A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses.
The proposed system reduces the error rate by 21% compared with the ASR model.
arXiv Detail & Related papers (2024-06-29T17:56:28Z) - Toward Practical Automatic Speech Recognition and Post-Processing: a
Call for Explainable Error Benchmark Guideline [12.197453599489963]
We propose the development of an Error Explainable Benchmark (EEB) dataset.
This dataset, while considering both speech- and text-level, enables a granular understanding of the model's shortcomings.
Our proposition provides a structured pathway for a more real-world-centric' evaluation, allowing for the detection and rectification of nuanced system weaknesses.
arXiv Detail & Related papers (2024-01-26T03:42:45Z) - TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in
End-to-End ASR [1.8477401359673709]
Class-probability-based confidence scores do not accurately represent quality of overconfident ASR predictions.
We propose a novel Temporal-Lexeme Similarity (TeLeS) confidence score to train Confidence Estimation Model (CEM)
We conduct experiments with ASR models trained in three languages, namely Hindi, Tamil, and Kannada, with varying training data sizes.
arXiv Detail & Related papers (2024-01-06T16:29:13Z) - BLSTM-Based Confidence Estimation for End-to-End Speech Recognition [41.423717224691046]
Confidence estimation is an important function for developing automatic speech recognition (ASR) applications.
Recent E2E ASR systems show high performance (e.g., around 5% token error rates) for various ASR tasks.
We employ a bidirectional long short-term memory (BLSTM)-based model as a strong binary-class (correct/incorrect) sequence labeler.
arXiv Detail & Related papers (2023-12-22T11:12:45Z) - Binary Classification with Confidence Difference [100.08818204756093]
This paper delves into a novel weakly supervised binary classification problem called confidence-difference (ConfDiff) classification.
We propose a risk-consistent approach to tackle this problem and show that the estimation error bound the optimal convergence rate.
We also introduce a risk correction approach to mitigate overfitting problems, whose consistency and convergence rate are also proven.
arXiv Detail & Related papers (2023-10-09T11:44:50Z) - Conservative Prediction via Data-Driven Confidence Minimization [70.93946578046003]
In safety-critical applications of machine learning, it is often desirable for a model to be conservative.
We propose the Data-Driven Confidence Minimization framework, which minimizes confidence on an uncertainty dataset.
arXiv Detail & Related papers (2023-06-08T07:05:36Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z) - Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.