TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in
End-to-End ASR
- URL: http://arxiv.org/abs/2401.03251v1
- Date: Sat, 6 Jan 2024 16:29:13 GMT
- Title: TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in
End-to-End ASR
- Authors: Nagarathna Ravi, Thishyan Raj T and Vipul Arora
- Abstract summary: Class-probability-based confidence scores do not accurately represent quality of overconfident ASR predictions.
We propose a novel Temporal-Lexeme Similarity (TeLeS) confidence score to train Confidence Estimation Model (CEM)
We conduct experiments with ASR models trained in three languages, namely Hindi, Tamil, and Kannada, with varying training data sizes.
- Score: 1.8477401359673709
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Confidence estimation of predictions from an End-to-End (E2E) Automatic
Speech Recognition (ASR) model benefits ASR's downstream and upstream tasks.
Class-probability-based confidence scores do not accurately represent the
quality of overconfident ASR predictions. An ancillary Confidence Estimation
Model (CEM) calibrates the predictions. State-of-the-art (SOTA) solutions use
binary target scores for CEM training. However, the binary labels do not reveal
the granular information of predicted words, such as temporal alignment between
reference and hypothesis and whether the predicted word is entirely incorrect
or contains spelling errors. Addressing this issue, we propose a novel
Temporal-Lexeme Similarity (TeLeS) confidence score to train CEM. To address
the data imbalance of target scores while training CEM, we use shrinkage loss
to focus on hard-to-learn data points and minimise the impact of easily learned
data points. We conduct experiments with ASR models trained in three languages,
namely Hindi, Tamil, and Kannada, with varying training data sizes. Experiments
show that TeLeS generalises well across domains. To demonstrate the
applicability of the proposed method, we formulate a TeLeS-based Acquisition
(TeLeS-A) function for sampling uncertainty in active learning. We observe a
significant reduction in the Word Error Rate (WER) as compared to SOTA methods.
Related papers
- Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation [1.8861801513235323]
We introduce a sampling-free approach for estimating well-calibrated confidence values for classification tasks.
Our approach maintains well-calibrated confidence values while achieving increased processing speed.
Our method produces underconfidence rather than overconfident predictions, an advantage for safety-critical applications.
arXiv Detail & Related papers (2024-11-18T15:13:20Z) - Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning [37.21211404608413]
We propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations.
We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions.
arXiv Detail & Related papers (2024-04-17T23:37:50Z) - Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample.
Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications.
We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - BLSTM-Based Confidence Estimation for End-to-End Speech Recognition [41.423717224691046]
Confidence estimation is an important function for developing automatic speech recognition (ASR) applications.
Recent E2E ASR systems show high performance (e.g., around 5% token error rates) for various ASR tasks.
We employ a bidirectional long short-term memory (BLSTM)-based model as a strong binary-class (correct/incorrect) sequence labeler.
arXiv Detail & Related papers (2023-12-22T11:12:45Z) - Accurate and Reliable Confidence Estimation Based on Non-Autoregressive
End-to-End Speech Recognition System [42.569506907182706]
Previous end-to-end(E2E) based confidence estimation models (CEM) predict score sequences of equal length with input transcriptions, leading to unreliable estimation when deletion and insertion errors occur.
We propose CIF-Aligned confidence estimation model (CA-CEM) to achieve accurate and reliable confidence estimation based on novel non-autoregressive E2E ASR model - Paraformer.
arXiv Detail & Related papers (2023-05-18T03:34:50Z) - Estimating Model Performance under Domain Shifts with Class-Specific
Confidence Scores [25.162667593654206]
We introduce class-wise calibration within the framework of performance estimation for imbalanced datasets.
We conduct experiments on four tasks and find the proposed modifications consistently improve the estimation accuracy for imbalanced datasets.
arXiv Detail & Related papers (2022-07-20T15:04:32Z) - Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions.
In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data.
We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z) - Learning from Similarity-Confidence Data [94.94650350944377]
We investigate a novel weakly supervised learning problem of learning from similarity-confidence (Sconf) data.
We propose an unbiased estimator of the classification risk that can be calculated from only Sconf data and show that the estimation error bound achieves the optimal convergence rate.
arXiv Detail & Related papers (2021-02-13T07:31:16Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z) - Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.