Fast Entropy-Based Methods of Word-Level Confidence Estimation for
End-To-End Automatic Speech Recognition
- URL: http://arxiv.org/abs/2212.08703v1
- Date: Fri, 16 Dec 2022 20:27:40 GMT
- Title: Fast Entropy-Based Methods of Word-Level Confidence Estimation for
End-To-End Automatic Speech Recognition
- Authors: Aleksandr Laptev and Boris Ginsburg
- Abstract summary: We show how per-frame entropy values can be normalized and aggregated to obtain a confidence measure per unit and per word.
We evaluate the proposed confidence measures on LibriSpeech test sets, and show that they are up to 2 and 4 times better than confidence estimation based on the maximum per-frame probability.
- Score: 86.21889574126878
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper presents a class of new fast non-trainable entropy-based
confidence estimation methods for automatic speech recognition. We show how
per-frame entropy values can be normalized and aggregated to obtain a
confidence measure per unit and per word for Connectionist Temporal
Classification (CTC) and Recurrent Neural Network Transducer (RNN-T) models.
Proposed methods have similar computational complexity to the traditional
method based on the maximum per-frame probability, but they are more
adjustable, have a wider effective threshold range, and better push apart the
confidence distributions of correct and incorrect words. We evaluate the
proposed confidence measures on LibriSpeech test sets, and show that they are
up to 2 and 4 times better than confidence estimation based on the maximum
per-frame probability at detecting incorrect words for Conformer-CTC and
Conformer-RNN-T models, respectively.
Related papers
- Fast Context-Biasing for CTC and Transducer ASR models with CTC-based Word Spotter [57.64003871384959]
This work presents a new approach to fast context-biasing with CTC-based Word Spotter.
The proposed method matches CTC log-probabilities against a compact context graph to detect potential context-biasing candidates.
The results demonstrate a significant acceleration of the context-biasing recognition with a simultaneous improvement in F-score and WER.
arXiv Detail & Related papers (2024-06-11T09:37:52Z) - High Confidence Level Inference is Almost Free using Parallel Stochastic
Optimization [16.38026811561888]
This paper introduces a novel inference method focused on constructing confidence intervals with efficient computation and fast convergence to the nominal level.
Our method requires minimal additional computation and memory beyond the standard updating of estimates, making the inference process almost cost-free.
arXiv Detail & Related papers (2024-01-17T17:11:45Z) - On Addressing Practical Challenges for RNN-Transduce [72.72132048437751]
We adapt a well-trained RNN-T model to a new domain without collecting the audio data.
We obtain word-level confidence scores by utilizing several types of features calculated during decoding.
The proposed time stamping method can get less than 50ms word timing difference on average.
arXiv Detail & Related papers (2021-04-27T23:31:43Z) - Learning Word-Level Confidence For Subword End-to-End ASR [48.09713798451474]
We study the problem of word-level confidence estimation in subword-based end-to-end (E2E) models for automatic speech recognition (ASR)
The proposed confidence module also enables a model selection approach to combine an on-device E2E model with a hybrid model on the server to address the rare word recognition problem for the E2E model.
arXiv Detail & Related papers (2021-03-11T15:03:33Z) - An evaluation of word-level confidence estimation for end-to-end
automatic speech recognition [70.61280174637913]
We investigate confidence estimation for end-to-end automatic speech recognition (ASR)
We provide an extensive benchmark of popular confidence methods on four well-known speech datasets.
Our results suggest a strong baseline can be obtained by scaling the logits by a learnt temperature.
arXiv Detail & Related papers (2021-01-14T09:51:59Z) - Confidence Estimation via Auxiliary Models [47.08749569008467]
We introduce a novel target criterion for model confidence, namely the true class probability ( TCP)
We show that TCP offers better properties for confidence estimation than standard maximum class probability (MCP)
arXiv Detail & Related papers (2020-12-11T17:21:12Z) - Confidence Estimation for Attention-based Sequence-to-sequence Models
for Speech Recognition [31.25931550876392]
Confidence scores from a speech recogniser are a useful measure to assess the quality of transcriptions.
We propose a lightweight and effective approach named confidence estimation module (CEM) on top of an existing end-to-end ASR model.
arXiv Detail & Related papers (2020-10-22T04:02:27Z) - Binary Classification from Positive Data with Skewed Confidence [85.18941440826309]
Positive-confidence (Pconf) classification is a promising weakly-supervised learning method.
In practice, the confidence may be skewed by bias arising in an annotation process.
We introduce the parameterized model of the skewed confidence, and propose the method for selecting the hyper parameter.
arXiv Detail & Related papers (2020-01-29T00:04:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.