Multi-Task Learning for End-to-End ASR Word and Utterance Confidence
with Deletion Prediction
- URL: http://arxiv.org/abs/2104.12870v1
- Date: Mon, 26 Apr 2021 20:38:42 GMT
- Title: Multi-Task Learning for End-to-End ASR Word and Utterance Confidence
with Deletion Prediction
- Authors: David Qiu, Yanzhang He, Qiujia Li, Yu Zhang, Liangliang Cao, Ian
McGraw
- Abstract summary: Confidence scores are very useful for downstream applications of automatic speech recognition (ASR) systems.
Recent works have proposed using neural networks to learn word or utterance confidence scores for end-to-end ASR.
This paper proposes to jointly learn word confidence, word deletion, and utterance confidence.
- Score: 20.00640459241358
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Confidence scores are very useful for downstream applications of automatic
speech recognition (ASR) systems. Recent works have proposed using neural
networks to learn word or utterance confidence scores for end-to-end ASR. In
those studies, word confidence by itself does not model deletions, and
utterance confidence does not take advantage of word-level training signals.
This paper proposes to jointly learn word confidence, word deletion, and
utterance confidence. Empirical results show that multi-task learning with all
three objectives improves confidence metrics (NCE, AUC, RMSE) without the need
for increasing the model size of the confidence estimation module. Using the
utterance-level confidence for rescoring also decreases the word error rates on
Google's Voice Search and Long-tail Maps datasets by 3-5% relative, without
needing a dedicated neural rescorer.
Related papers
- Learning to Route with Confidence Tokens [43.63392143501436]
We study the extent to which large language models can reliably indicate confidence in their answers.
We propose Self-REF, a lightweight training strategy to teach LLMs to express confidence in a reliable manner.
Compared to conventional approaches such as verbalizing confidence and examining token probabilities, we demonstrate empirically that confidence tokens show significant improvements in downstream routing and rejection learning tasks.
arXiv Detail & Related papers (2024-10-17T07:28:18Z) - Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition [52.624909026294105]
We propose a non-autoregressive speech error correction method.
A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses.
The proposed system reduces the error rate by 21% compared with the ASR model.
arXiv Detail & Related papers (2024-06-29T17:56:28Z) - A Confidence-based Partial Label Learning Model for Crowd-Annotated
Named Entity Recognition [74.79785063365289]
Existing models for named entity recognition (NER) are mainly based on large-scale labeled datasets.
We propose a Confidence-based Partial Label Learning (CPLL) method to integrate the prior confidence (given by annotators) and posterior confidences (learned by models) for crowd-annotated NER.
arXiv Detail & Related papers (2023-05-21T15:31:23Z) - Fast Entropy-Based Methods of Word-Level Confidence Estimation for
End-To-End Automatic Speech Recognition [86.21889574126878]
We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per word.
We evaluate the proposed confidence measures on LibriSpeech test sets, and show that they are up to 2 and 4 times better than confidence estimation based on the maximum per-frame probability.
arXiv Detail & Related papers (2022-12-16T20:27:40Z) - Improving Confidence Estimation on Out-of-Domain Data for End-to-End
Speech Recognition [25.595147432155642]
This paper proposes two approaches to improve the model-based confidence estimators on out-of-domain data.
Experiments show that the proposed methods can significantly improve the confidence metrics on TED-LIUM and Switchboard datasets.
arXiv Detail & Related papers (2021-10-07T10:44:27Z) - On Addressing Practical Challenges for RNN-Transduce [72.72132048437751]
We adapt a well-trained RNN-T model to a new domain without collecting the audio data.
We obtain word-level confidence scores by utilizing several types of features calculated during decoding.
The proposed time stamping method can get less than 50ms word timing difference on average.
arXiv Detail & Related papers (2021-04-27T23:31:43Z) - Learning Word-Level Confidence For Subword End-to-End ASR [48.09713798451474]
We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR)
The proposed confidence module also enables a model selection approach to combine an on-device E2E model with a hybrid model on the server to address the rare word recognition problem for the E2E model.
arXiv Detail & Related papers (2021-03-11T15:03:33Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z) - Confidence Estimation for Attention-based Sequence-to-sequence Models
for Speech Recognition [31.25931550876392]
Confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions.
We propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model.
arXiv Detail & Related papers (2020-10-22T04:02:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.