Confidence Estimation for Attention-based Sequence-to-sequence Models
for Speech Recognition
- URL: http://arxiv.org/abs/2010.11428v2
- Date: Fri, 23 Oct 2020 18:49:07 GMT
- Title: Confidence Estimation for Attention-based Sequence-to-sequence Models
for Speech Recognition
- Authors: Qiujia Li, David Qiu, Yu Zhang, Bo Li, Yanzhang He, Philip C.
Woodland, Liangliang Cao, Trevor Strohman
- Abstract summary: Confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions.
We propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model.
- Score: 31.25931550876392
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: For various speech-related tasks, confidence scores from a speech recogniser
are a useful measure to assess the quality of transcriptions. In traditional
hidden Markov model-based automatic speech recognition (ASR) systems,
confidence scores can be reliably obtained from word posteriors in decoding
lattices. However, for an ASR system with an auto-regressive decoder, such as
an attention-based sequence-to-sequence model, computing word posteriors is
difficult. An obvious alternative is to use the decoder softmax probability as
the model confidence. In this paper, we first examine how some commonly used
regularisation methods influence the softmax-based confidence scores and study
the overconfident behaviour of end-to-end models. Then we propose a lightweight
and effective approach named confidence estimation module (CEM) on top of an
existing end-to-end ASR model. Experiments on LibriSpeech show that CEM can
mitigate the overconfidence problem and can produce more reliable confidence
scores with and without shallow fusion of a language model. Further analysis
shows that CEM generalises well to speech from a moderately mismatched domain
and can potentially improve downstream tasks such as semi-supervised learning.
Related papers
- Cycles of Thought: Measuring LLM Confidence through Stable Explanations [53.15438489398938]
Large language models (LLMs) can reach and even surpass human-level accuracy on a variety of benchmarks, but their overconfidence in incorrect responses is still a well-documented failure mode.
We propose a framework for measuring an LLM's uncertainty with respect to the distribution of generated explanations for an answer.
arXiv Detail & Related papers (2024-06-05T16:35:30Z) - A Confidence-based Partial Label Learning Model for Crowd-Annotated
Named Entity Recognition [74.79785063365289]
Existing models for named entity recognition (NER) are mainly based on large-scale labeled datasets.
We propose a Confidence-based Partial Label Learning (CPLL) method to integrate the prior confidence (given by annotators) and posterior confidences (learned by models) for crowd-annotated NER.
arXiv Detail & Related papers (2023-05-21T15:31:23Z) - Fast Entropy-Based Methods of Word-Level Confidence Estimation for
End-To-End Automatic Speech Recognition [86.21889574126878]
We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per word.
We evaluate the proposed confidence measures on LibriSpeech test sets, and show that they are up to 2 and 4 times better than confidence estimation based on the maximum per-frame probability.
arXiv Detail & Related papers (2022-12-16T20:27:40Z) - Improving Confidence Estimation on Out-of-Domain Data for End-to-End
Speech Recognition [25.595147432155642]
This paper proposes two approaches to improve the model-based confidence estimators on out-of-domain data.
Experiments show that the proposed methods can significantly improve the confidence metrics on TED-LIUM and Switchboard datasets.
arXiv Detail & Related papers (2021-10-07T10:44:27Z) - Multi-Task Learning for End-to-End ASR Word and Utterance Confidence
with Deletion Prediction [20.00640459241358]
Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems.
Recent works have proposed using neural networks to learn word or utterance confidence scores for end-to-end ASR.
This paper proposes to jointly learn word confidence, word deletion, and utterance confidence.
arXiv Detail & Related papers (2021-04-26T20:38:42Z) - Learning Word-Level Confidence For Subword End-to-End ASR [48.09713798451474]
We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR)
The proposed confidence module also enables a model selection approach to combine an on-device E2E model with a hybrid model on the server to address the rare word recognition problem for the E2E model.
arXiv Detail & Related papers (2021-03-11T15:03:33Z) - Fast End-to-End Speech Recognition via a Non-Autoregressive Model and
Cross-Modal Knowledge Transferring from BERT [72.93855288283059]
We propose a non-autoregressive speech recognition model called LASO (Listen Attentively, and Spell Once)
The model consists of an encoder, a decoder, and a position dependent summarizer (PDS)
arXiv Detail & Related papers (2021-02-15T15:18:59Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.