Improving Confidence Estimation on Out-of-Domain Data for End-to-End
Speech Recognition
- URL: http://arxiv.org/abs/2110.03327v1
- Date: Thu, 7 Oct 2021 10:44:27 GMT
- Title: Improving Confidence Estimation on Out-of-Domain Data for End-to-End
Speech Recognition
- Authors: Qiujia Li, Yu Zhang, David Qiu, Yanzhang He, Liangliang Cao, Philip C.
Woodland
- Abstract summary: This paper proposes two approaches to improve the model-based confidence estimators on out-of-domain data.
Experiments show that the proposed methods can significantly improve the confidence metrics on TED-LIUM and Switchboard datasets.
- Score: 25.595147432155642
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: As end-to-end automatic speech recognition (ASR) models reach promising
performance, various downstream tasks rely on good confidence estimators for
these systems. Recent research has shown that model-based confidence estimators
have a significant advantage over using the output softmax probabilities. If
the input data to the speech recogniser is from mismatched acoustic and
linguistic conditions, the ASR performance and the corresponding confidence
estimators may exhibit severe degradation. Since confidence models are often
trained on the same in-domain data as the ASR, generalising to out-of-domain
(OOD) scenarios is challenging. By keeping the ASR model untouched, this paper
proposes two approaches to improve the model-based confidence estimators on OOD
data: using pseudo transcriptions and an additional OOD language model. With an
ASR model trained on LibriSpeech, experiments show that the proposed methods
can significantly improve the confidence metrics on TED-LIUM and Switchboard
datasets while preserving in-domain performance. Furthermore, the improved
confidence estimators are better calibrated on OOD data and can provide a much
more reliable criterion for data selection.
Related papers
- TeLeS: Temporal Lexeme Similarity Score to Estimate Confidence in
End-to-End ASR [1.8477401359673709]
Class-probability-based confidence scores do not accurately represent quality of overconfident ASR predictions.
We propose a novel Temporal-Lexeme Similarity (TeLeS) confidence score to train Confidence Estimation Model (CEM)
We conduct experiments with ASR models trained in three languages, namely Hindi, Tamil, and Kannada, with varying training data sizes.
arXiv Detail & Related papers (2024-01-06T16:29:13Z) - BLSTM-Based Confidence Estimation for End-to-End Speech Recognition [41.423717224691046]
Confidence estimation is an important function for developing automatic speech recognition (ASR) applications.
Recent E2E ASR systems show high performance (e.g., around 5% token error rates) for various ASR tasks.
We employ a bidirectional long short-term memory (BLSTM)-based model as a strong binary-class (correct/incorrect) sequence labeler.
arXiv Detail & Related papers (2023-12-22T11:12:45Z) - Reliability in Semantic Segmentation: Can We Use Synthetic Data? [69.28268603137546]
We show for the first time how synthetic data can be specifically generated to assess comprehensively the real-world reliability of semantic segmentation models.
This synthetic data is employed to evaluate the robustness of pretrained segmenters.
We demonstrate how our approach can be utilized to enhance the calibration and OOD detection capabilities of segmenters.
arXiv Detail & Related papers (2023-12-14T18:56:07Z) - A Confidence-based Partial Label Learning Model for Crowd-Annotated
Named Entity Recognition [74.79785063365289]
Existing models for named entity recognition (NER) are mainly based on large-scale labeled datasets.
We propose a Confidence-based Partial Label Learning (CPLL) method to integrate the prior confidence (given by annotators) and posterior confidences (learned by models) for crowd-annotated NER.
arXiv Detail & Related papers (2023-05-21T15:31:23Z) - Fast Entropy-Based Methods of Word-Level Confidence Estimation for
End-To-End Automatic Speech Recognition [86.21889574126878]
We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per word.
We evaluate the proposed confidence measures on LibriSpeech test sets, and show that they are up to 2 and 4 times better than confidence estimation based on the maximum per-frame probability.
arXiv Detail & Related papers (2022-12-16T20:27:40Z) - Listen, Adapt, Better WER: Source-free Single-utterance Test-time
Adaptation for Automatic Speech Recognition [65.84978547406753]
Test-time Adaptation aims to adapt the model trained on source domains to yield better predictions for test samples.
Single-Utterance Test-time Adaptation (SUTA) is the first TTA study in speech area to our best knowledge.
arXiv Detail & Related papers (2022-03-27T06:38:39Z) - Learning Word-Level Confidence For Subword End-to-End ASR [48.09713798451474]
We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR)
The proposed confidence module also enables a model selection approach to combine an on-device E2E model with a hybrid model on the server to address the rare word recognition problem for the E2E model.
arXiv Detail & Related papers (2021-03-11T15:03:33Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z) - Uncertainty-sensitive Activity Recognition: a Reliability Benchmark and
the CARING Models [37.60817779613977]
We present the first study of how welthe confidence values of modern action recognition architectures indeed reflect the probability of the correct outcome.
We introduce a new approach which learns to transform the model output into realistic confidence estimates through an additional calibration network.
arXiv Detail & Related papers (2021-01-02T15:41:21Z) - Confidence Estimation for Attention-based Sequence-to-sequence Models
for Speech Recognition [31.25931550876392]
Confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions.
We propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model.
arXiv Detail & Related papers (2020-10-22T04:02:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.