Accurate and Reliable Confidence Estimation Based on Non-Autoregressive
End-to-End Speech Recognition System
- URL: http://arxiv.org/abs/2305.10680v2
- Date: Thu, 25 May 2023 02:26:35 GMT
- Title: Accurate and Reliable Confidence Estimation Based on Non-Autoregressive
End-to-End Speech Recognition System
- Authors: Xian Shi, Haoneng Luo, Zhifu Gao, Shiliang Zhang, Zhijie Yan
- Abstract summary: Previous end-to-end(E2E) based confidence estimation models (CEM) predict score sequences of equal length with input transcriptions, leading to unreliable estimation when deletion and insertion errors occur.
We propose CIF-Aligned confidence estimation model (CA-CEM) to achieve accurate and reliable confidence estimation based on novel non-autoregressive E2E ASR model - Paraformer.
- Score: 42.569506907182706
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Estimating confidence scores for recognition results is a classic task in ASR
field and of vital importance for kinds of downstream tasks and training
strategies. Previous end-to-end~(E2E) based confidence estimation models (CEM)
predict score sequences of equal length with input transcriptions, leading to
unreliable estimation when deletion and insertion errors occur. In this paper
we proposed CIF-Aligned confidence estimation model (CA-CEM) to achieve
accurate and reliable confidence estimation based on novel non-autoregressive
E2E ASR model - Paraformer. CA-CEM utilizes the modeling character of
continuous integrate-and-fire (CIF) mechanism to generate token-synchronous
acoustic embedding, which solves the estimation failure issue above. We measure
the quality of estimation with AUC and RMSE in token level and ECE-U - a
proposed metrics in utterance level. CA-CEM gains 24% and 19% relative
reduction on ECE-U and also better AUC and RMSE on two test sets. Furthermore,
we conduct analysis to explore the potential of CEM for different ASR related
usage.
Related papers
- Addressing Uncertainty in LLMs to Enhance Reliability in Generative AI [47.64301863399763]
We present a dynamic semantic clustering approach inspired by the Chinese Restaurant Process.
We quantify uncertainty of Large Language Models (LLMs) on a given query by calculating entropy of the generated semantic clusters.
We propose leveraging the (negative) likelihood of these clusters as the (non)conformity score within Conformal Prediction framework.
arXiv Detail & Related papers (2024-11-04T18:49:46Z) - Confidence Estimation for LLM-Based Dialogue State Tracking [9.305763502526833]
Estimation of a model's confidence on its outputs is critical for Conversational AI systems based on large language models (LLMs)
We provide an exhaustive exploration of methods, including approaches proposed for open- and closed-weight LLMs.
Our findings suggest that fine-tuning open-weight LLMs can result in enhanced AUC performance, indicating better confidence score calibration.
arXiv Detail & Related papers (2024-09-15T06:44:26Z) - TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in
End-to-End ASR [1.8477401359673709]
Class-probability-based confidence scores do not accurately represent quality of overconfident ASR predictions.
We propose a novel Temporal-Lexeme Similarity (TeLeS) confidence score to train Confidence Estimation Model (CEM)
We conduct experiments with ASR models trained in three languages, namely Hindi, Tamil, and Kannada, with varying training data sizes.
arXiv Detail & Related papers (2024-01-06T16:29:13Z) - BLSTM-Based Confidence Estimation for End-to-End Speech Recognition [41.423717224691046]
Confidence estimation is an important function for developing automatic speech recognition (ASR) applications.
Recent E2E ASR systems show high performance (e.g., around 5% token error rates) for various ASR tasks.
We employ a bidirectional long short-term memory (BLSTM)-based model as a strong binary-class (correct/incorrect) sequence labeler.
arXiv Detail & Related papers (2023-12-22T11:12:45Z) - Cal-SFDA: Source-Free Domain-adaptive Semantic Segmentation with
Differentiable Expected Calibration Error [50.86671887712424]
The prevalence of domain adaptive semantic segmentation has prompted concerns regarding source domain data leakage.
To circumvent the requirement for source data, source-free domain adaptation has emerged as a viable solution.
We propose a novel calibration-guided source-free domain adaptive semantic segmentation framework.
arXiv Detail & Related papers (2023-08-06T03:28:34Z) - Fast Entropy-Based Methods of Word-Level Confidence Estimation for
End-To-End Automatic Speech Recognition [86.21889574126878]
We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per word.
We evaluate the proposed confidence measures on LibriSpeech test sets, and show that they are up to 2 and 4 times better than confidence estimation based on the maximum per-frame probability.
arXiv Detail & Related papers (2022-12-16T20:27:40Z) - Uncertainty-Driven Action Quality Assessment [67.20617610820857]
We propose a novel probabilistic model, named Uncertainty-Driven AQA (UD-AQA), to capture the diversity among multiple judge scores.
We generate the estimation of uncertainty for each prediction, which is employed to re-weight AQA regression loss.
Our proposed method achieves competitive results on three benchmarks including the Olympic events MTL-AQA and FineDiving, and the surgical skill JIGSAWS datasets.
arXiv Detail & Related papers (2022-07-29T07:21:15Z) - Improving Confidence Estimation on Out-of-Domain Data for End-to-End
Speech Recognition [25.595147432155642]
This paper proposes two approaches to improve the model-based confidence estimators on out-of-domain data.
Experiments show that the proposed methods can significantly improve the confidence metrics on TED-LIUM and Switchboard datasets.
arXiv Detail & Related papers (2021-10-07T10:44:27Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.