Related papers: TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR

TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR

URL: http://arxiv.org/abs/2401.03251v1
Date: Sat, 6 Jan 2024 16:29:13 GMT
Title: TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in End-to-End ASR
Authors: Nagarathna Ravi, Thishyan Raj T and Vipul Arora
Abstract summary: Class-probability-based confidence scores do not accurately represent quality of overconfident ASR predictions. We propose a novel Temporal-Lexeme Similarity (TeLeS) confidence score to train Confidence Estimation Model (CEM) We conduct experiments with ASR models trained in three languages, namely Hindi, Tamil, and Kannada, with varying training data sizes.
Score: 1.8477401359673709
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Confidence estimation of predictions from an End-to-End (E2E) Automatic Speech Recognition (ASR) model benefits ASR's downstream and upstream tasks. Class-probability-based confidence scores do not accurately represent the quality of overconfident ASR predictions. An ancillary Confidence Estimation Model (CEM) calibrates the predictions. State-of-the-art (SOTA) solutions use binary target scores for CEM training. However, the binary labels do not reveal the granular information of predicted words, such as temporal alignment between reference and hypothesis and whether the predicted word is entirely incorrect or contains spelling errors. Addressing this issue, we propose a novel Temporal-Lexeme Similarity (TeLeS) confidence score to train CEM. To address the data imbalance of target scores while training CEM, we use shrinkage loss to focus on hard-to-learn data points and minimise the impact of easily learned data points. We conduct experiments with ASR models trained in three languages, namely Hindi, Tamil, and Kannada, with varying training data sizes. Experiments show that TeLeS generalises well across domains. To demonstrate the applicability of the proposed method, we formulate a TeLeS-based Acquisition (TeLeS-A) function for sampling uncertainty in active learning. We observe a significant reduction in the Word Error Rate (WER) as compared to SOTA methods.

Related papers

Trust, or Don't Predict: Introducing the CWSA Family for Confidence-Aware Model Evaluation [0.0]
We introduce two new metrics Confidence-Weighted Selective Accuracy (CWSA) and its normalized variant CWSA+.<n>CWSA offers principled and interpretable way to evaluate predictive models under confidence thresholds.<n>We show that CWSA and CWSA+ both effectively detect nuanced failure modes and outperform classical metrics in trust-sensitive tests.
arXiv Detail & Related papers (2025-05-24T10:07:48Z)
Redefining Machine Unlearning: A Conformal Prediction-Motivated Approach [11.609354498110358]
Machine unlearning seeks to remove the influence of specified data from a trained model.<n>In this paper, we find that the data misclassified across UA and MIA still have their ground truth labels included in the prediction set.<n>We propose two novel metrics inspired by conformal prediction that more reliably evaluate forgetting quality.
arXiv Detail & Related papers (2025-01-31T18:58:43Z)
Calibrated and Efficient Sampling-Free Confidence Estimation for LiDAR Scene Semantic Segmentation [1.8861801513235323]
We introduce a sampling-free approach for estimating well-calibrated confidence values for classification tasks. Our approach maintains well-calibrated confidence values while achieving increased processing speed. Our method produces underconfidence rather than overconfident predictions, an advantage for safety-critical applications.
arXiv Detail & Related papers (2024-11-18T15:13:20Z)
Towards Robust and Interpretable EMG-based Hand Gesture Recognition using Deep Metric Meta Learning [37.21211404608413]
We propose a shift to deep metric-based meta-learning in EMG PR to supervise the creation of meaningful and interpretable representations. We derive a robust class proximity-based confidence estimator that leads to a better rejection of incorrect decisions.
arXiv Detail & Related papers (2024-04-17T23:37:50Z)
Uncertainty-Calibrated Test-Time Model Adaptation without Forgetting [55.17761802332469]
Test-time adaptation (TTA) seeks to tackle potential distribution shifts between training and test data by adapting a given model w.r.t. any test sample. Prior methods perform backpropagation for each test sample, resulting in unbearable optimization costs to many applications. We propose an Efficient Anti-Forgetting Test-Time Adaptation (EATA) method which develops an active sample selection criterion to identify reliable and non-redundant samples.
arXiv Detail & Related papers (2024-03-18T05:49:45Z)
Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice. We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z)
BLSTM-Based Confidence Estimation for End-to-End Speech Recognition [41.423717224691046]
Confidence estimation is an important function for developing automatic speech recognition (ASR) applications. Recent E2E ASR systems show high performance (e.g., around 5% token error rates) for various ASR tasks. We employ a bidirectional long short-term memory (BLSTM)-based model as a strong binary-class (correct/incorrect) sequence labeler.
arXiv Detail & Related papers (2023-12-22T11:12:45Z)
Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System [42.569506907182706]
Previous end-to-end(E2E) based confidence estimation models (CEM) predict score sequences of equal length with input transcriptions, leading to unreliable estimation when deletion and insertion errors occur. We propose CIF-Aligned confidence estimation model (CA-CEM) to achieve accurate and reliable confidence estimation based on novel non-autoregressive E2E ASR model - Paraformer.
arXiv Detail & Related papers (2023-05-18T03:34:50Z)
Estimating Model Performance under Domain Shifts with Class-Specific Confidence Scores [25.162667593654206]
We introduce class-wise calibration within the framework of performance estimation for imbalanced datasets. We conduct experiments on four tasks and find the proposed modifications consistently improve the estimation accuracy for imbalanced datasets.
arXiv Detail & Related papers (2022-07-20T15:04:32Z)
Leveraging Unlabeled Data to Predict Out-of-Distribution Performance [63.740181251997306]
Real-world machine learning deployments are characterized by mismatches between the source (training) and target (test) distributions. In this work, we investigate methods for predicting the target domain accuracy using only labeled source data and unlabeled target data. We propose Average Thresholded Confidence (ATC), a practical method that learns a threshold on the model's confidence, predicting accuracy as the fraction of unlabeled examples.
arXiv Detail & Related papers (2022-01-11T23:01:12Z)
Learning from Similarity-Confidence Data [94.94650350944377]
We investigate a novel weakly supervised learning problem of learning from similarity-confidence (Sconf) data. We propose an unbiased estimator of the classification risk that can be calculated from only Sconf data and show that the estimation error bound achieves the optimal convergence rate.
arXiv Detail & Related papers (2021-02-13T07:31:16Z)
An evaluation of word-level confidence estimation for end-to-end automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR) We provide an extensive benchmark of popular confidence methods on four well-known speech datasets. Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z)
Unsupervised Domain Adaptation for Speech Recognition via Uncertainty Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z)

This list is automatically generated from the titles and abstracts of the papers in this site.