Learning Word-Level Confidence For Subword End-to-End ASR
- URL: http://arxiv.org/abs/2103.06716v1
- Date: Thu, 11 Mar 2021 15:03:33 GMT
- Title: Learning Word-Level Confidence For Subword End-to-End ASR
- Authors: David Qiu, Qiujia Li, Yanzhang He, Yu Zhang, Bo Li, Liangliang Cao,
Rohit Prabhavalkar, Deepti Bhatia, Wei Li, Ke Hu, Tara N. Sainath, Ian McGraw
- Abstract summary: We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR)
The proposed confidence module also enables a model selection approach to combine an on-device E2E model with a hybrid model on the server to address the rare word recognition problem for the E2E model.
- Score: 48.09713798451474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We study the problem of word-level confidence estimation in subword-based
end-to-end (E2E) models for automatic speech recognition (ASR). Although prior
works have proposed training auxiliary confidence models for ASR systems, they
do not extend naturally to systems that operate on word-pieces (WP) as their
vocabulary. In particular, ground truth WP correctness labels are needed for
training confidence models, but the non-unique tokenization from word to WP
causes inaccurate labels to be generated. This paper proposes and studies two
confidence models of increasing complexity to solve this problem. The final
model uses self-attention to directly learn word-level confidence without
needing subword tokenization, and exploits full context features from multiple
hypotheses to improve confidence accuracy. Experiments on Voice Search and
long-tail test sets show standard metrics (e.g., NCE, AUC, RMSE) improving
substantially. The proposed confidence module also enables a model selection
approach to combine an on-device E2E model with a hybrid model on the server to
address the rare word recognition problem for the E2E model.
Related papers
- Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models [84.8919069953397]
Self-TAught Recognizer (STAR) is an unsupervised adaptation framework for speech recognition systems.
We show that STAR achieves an average of 13.5% relative reduction in word error rate across 14 target domains.
STAR exhibits high data efficiency that only requires less than one-hour unlabeled data.
arXiv Detail & Related papers (2024-05-23T04:27:11Z) - BLSTM-Based Confidence Estimation for End-to-End Speech Recognition [41.423717224691046]
Confidence estimation is an important function for developing automatic speech recognition (ASR) applications.
Recent E2E ASR systems show high performance (e.g., around 5% token error rates) for various ASR tasks.
We employ a bidirectional long short-term memory (BLSTM)-based model as a strong binary-class (correct/incorrect) sequence labeler.
arXiv Detail & Related papers (2023-12-22T11:12:45Z) - A Confidence-based Partial Label Learning Model for Crowd-Annotated
Named Entity Recognition [74.79785063365289]
Existing models for named entity recognition (NER) are mainly based on large-scale labeled datasets.
We propose a Confidence-based Partial Label Learning (CPLL) method to integrate the prior confidence (given by annotators) and posterior confidences (learned by models) for crowd-annotated NER.
arXiv Detail & Related papers (2023-05-21T15:31:23Z) - Fast Entropy-Based Methods of Word-Level Confidence Estimation for
End-To-End Automatic Speech Recognition [86.21889574126878]
We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per word.
We evaluate the proposed confidence measures on LibriSpeech test sets, and show that they are up to 2 and 4 times better than confidence estimation based on the maximum per-frame probability.
arXiv Detail & Related papers (2022-12-16T20:27:40Z) - Improving Confidence Estimation on Out-of-Domain Data for End-to-End
Speech Recognition [25.595147432155642]
This paper proposes two approaches to improve the model-based confidence estimators on out-of-domain data.
Experiments show that the proposed methods can significantly improve the confidence metrics on TED-LIUM and Switchboard datasets.
arXiv Detail & Related papers (2021-10-07T10:44:27Z) - Multi-Task Learning for End-to-End ASR Word and Utterance Confidence
with Deletion Prediction [20.00640459241358]
Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems.
Recent works have proposed using neural networks to learn word or utterance confidence scores for end-to-end ASR.
This paper proposes to jointly learn word confidence, word deletion, and utterance confidence.
arXiv Detail & Related papers (2021-04-26T20:38:42Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z) - Confidence Estimation for Attention-based Sequence-to-sequence Models
for Speech Recognition [31.25931550876392]
Confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions.
We propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model.
arXiv Detail & Related papers (2020-10-22T04:02:27Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.