Word-level confidence estimation for RNN transducers
- URL: http://arxiv.org/abs/2110.15222v1
- Date: Tue, 28 Sep 2021 18:38:00 GMT
- Title: Word-level confidence estimation for RNN transducers
- Authors: Mingqiu Wang, Hagen Soltau, Laurent El Shafey, Izhak Shafran
- Abstract summary: We present a lightweight neural confidence model tailored for Automatic Speech Recognition (ASR) system with Recurrent Network Transducers (RNN-T)
Compared to other existing approaches, our model utilizes: (a) the time information associated with recognized words, which reduces the computational complexity, and (b) a simple and elegant trick for mapping between sub-word and word sequences.
- Score: 7.12355127219356
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Confidence estimate is an often requested feature in applications such as
medical transcription where errors can impact patient care and the confidence
estimate could be used to alert medical professionals to verify potential
errors in recognition.
In this paper, we present a lightweight neural confidence model tailored for
Automatic Speech Recognition (ASR) system with Recurrent Neural Network
Transducers (RNN-T). Compared to other existing approaches, our model utilizes:
(a) the time information associated with recognized words, which reduces the
computational complexity, and (b) a simple and elegant trick for mapping
between sub-word and word sequences. The mapping addresses the non-unique
tokenization and token deletion problems while amplifying differences between
confusable words. Through extensive empirical evaluations on two different
long-form test sets, we demonstrate that the model achieves a performance of
0.4 Normalized Cross Entropy (NCE) and 0.05 Expected Calibration Error (ECE).
It is robust across different ASR configurations, including target types
(graphemes vs. morphemes), traffic conditions (streaming vs. non-streaming),
and encoder types. We further discuss the importance of evaluation metrics to
reflect practical applications and highlight the need for further work in
improving Area Under the Curve (AUC) for Negative Precision Rate (NPV) and True
Negative Rate (TNR).
Related papers
- PseudoNeg-MAE: Self-Supervised Point Cloud Learning using Conditional Pseudo-Negative Embeddings [55.55445978692678]
PseudoNeg-MAE is a self-supervised learning framework that enhances global feature representation of point cloud mask autoencoders.
We show that PseudoNeg-MAE achieves state-of-the-art performance on the ModelNet40 and ScanObjectNN datasets.
arXiv Detail & Related papers (2024-09-24T07:57:21Z) - Noisy Correspondence Learning with Self-Reinforcing Errors Mitigation [63.180725016463974]
Cross-modal retrieval relies on well-matched large-scale datasets that are laborious in practice.
We introduce a novel noisy correspondence learning framework, namely textbfSelf-textbfReinforcing textbfErrors textbfMitigation (SREM)
arXiv Detail & Related papers (2023-12-27T09:03:43Z) - Deep Neural Networks Tend To Extrapolate Predictably [51.303814412294514]
neural network predictions tend to be unpredictable and overconfident when faced with out-of-distribution (OOD) inputs.
We observe that neural network predictions often tend towards a constant value as input data becomes increasingly OOD.
We show how one can leverage our insights in practice to enable risk-sensitive decision-making in the presence of OOD inputs.
arXiv Detail & Related papers (2023-10-02T03:25:32Z) - Interpretable Anomaly Detection in Cellular Networks by Learning
Concepts in Variational Autoencoders [8.612111588129167]
This paper addresses the challenges of detecting anomalies in cellular networks in an interpretable way.
We propose a new approach using variational autoencoders (VAEs) that learn interpretable representations of the latent space for each Key Performance Indicator (KPI) in the dataset.
arXiv Detail & Related papers (2023-06-28T05:50:17Z) - Fast Entropy-Based Methods of Word-Level Confidence Estimation for
End-To-End Automatic Speech Recognition [86.21889574126878]
We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per word.
We evaluate the proposed confidence measures on LibriSpeech test sets, and show that they are up to 2 and 4 times better than confidence estimation based on the maximum per-frame probability.
arXiv Detail & Related papers (2022-12-16T20:27:40Z) - BayesNetCNN: incorporating uncertainty in neural networks for
image-based classification tasks [0.29005223064604074]
We propose a method to convert a standard neural network into a Bayesian neural network.
We estimate the variability of predictions by sampling different networks similar to the original one at each forward pass.
We test our model in a large cohort of brain images from Alzheimer's Disease patients.
arXiv Detail & Related papers (2022-09-27T01:07:19Z) - Hybrid Predictive Coding: Inferring, Fast and Slow [62.997667081978825]
We propose a hybrid predictive coding network that combines both iterative and amortized inference in a principled manner.
We demonstrate that our model is inherently sensitive to its uncertainty and adaptively balances balances to obtain accurate beliefs using minimum computational expense.
arXiv Detail & Related papers (2022-04-05T12:52:45Z) - Interpretable Additive Recurrent Neural Networks For Multivariate
Clinical Time Series [4.125698836261585]
We present the Interpretable-RNN (I-RNN) that balances model complexity and accuracy by forcing the relationship between variables in the model to be additive.
I-RNN specifically captures the unique characteristics of clinical time series, which are unevenly sampled in time, asynchronously acquired, and have missing data.
We evaluate the I-RNN model on the Physionet 2012 Challenge dataset to predict in-hospital mortality, and on a real-world clinical decision support task: predicting hemodynamic interventions in the intensive care unit.
arXiv Detail & Related papers (2021-09-15T22:30:19Z) - Detecting Misclassification Errors in Neural Networks with a Gaussian
Process Model [20.948038514886377]
This paper presents a new framework that produces a quantitative metric for detecting misclassification errors.
The framework, RED, builds an error detector on top of the base classifier and estimates uncertainty of the detection scores using Gaussian Processes.
arXiv Detail & Related papers (2020-10-05T15:01:30Z) - Collaborative Boundary-aware Context Encoding Networks for Error Map
Prediction [65.44752447868626]
We propose collaborative boundaryaware context encoding networks called AEP-Net for error prediction task.
Specifically, we propose a collaborative feature transformation branch for better feature fusion between images and masks, and precise localization of error regions.
The AEP-Net achieves an average DSC of 0.8358, 0.8164 for error prediction task, and shows a high Pearson correlation coefficient of 0.9873.
arXiv Detail & Related papers (2020-06-25T12:42:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.