ASR Error Correction with Constrained Decoding on Operation Prediction
- URL: http://arxiv.org/abs/2208.04641v1
- Date: Tue, 9 Aug 2022 09:59:30 GMT
- Title: ASR Error Correction with Constrained Decoding on Operation Prediction
- Authors: Jingyuan Yang, Rongjun Li, Wei Peng
- Abstract summary: We propose an ASR error correction method utilizing the predictions of correction operations.
Experiments on three public datasets demonstrate the effectiveness of the proposed approach in reducing the latency of the decoding process.
- Score: 8.701142327932484
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Error correction techniques remain effective to refine outputs from automatic
speech recognition (ASR) models. Existing end-to-end error correction methods
based on an encoder-decoder architecture process all tokens in the decoding
phase, creating undesirable latency. In this paper, we propose an ASR error
correction method utilizing the predictions of correction operations. More
specifically, we construct a predictor between the encoder and the decoder to
learn if a token should be kept ("K"), deleted ("D"), or changed ("C") to
restrict decoding to only part of the input sequence embeddings (the "C"
tokens) for fast inference. Experiments on three public datasets demonstrate
the effectiveness of the proposed approach in reducing the latency of the
decoding process in ASR correction. It enhances the inference speed by at least
three times (3.4 and 5.7 times) while maintaining the same level of accuracy
(with WER reductions of 0.53% and 1.69% respectively) for our two proposed
models compared to a solid encoder-decoder baseline. In the meantime, we
produce and release a benchmark dataset contributing to the ASR error
correction community to foster research along this line.
Related papers
- Tag and correct: high precision post-editing approach to correction of speech recognition errors [0.0]
It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis word by word and a corrector module that applies corrections returned by the tagger.
The proposed solution is applicable to any ASR system, regardless of its architecture, and provides high-precision control over errors being corrected.
arXiv Detail & Related papers (2024-06-11T09:52:33Z) - An Effective Mixture-Of-Experts Approach For Code-Switching Speech
Recognition Leveraging Encoder Disentanglement [9.28943772676672]
Codeswitching phenomenon remains a major obstacle that hinders automatic speech recognition.
We introduce a novel disentanglement loss to enable the lower-layer of the encoder to capture inter-lingual acoustic information.
We verify that our proposed method outperforms the prior-art methods using pretrained dual-encoders.
arXiv Detail & Related papers (2024-02-27T04:08:59Z) - SoftCorrect: Error Correction with Soft Detection for Automatic Speech
Recognition [116.31926128970585]
We propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection.
Compared with implicit error detection with CTC loss, SoftCorrect provides explicit signal about which words are incorrect.
Experiments on AISHELL-1 and Aidatatang datasets show that SoftCorrect achieves 26.1% and 9.4% CER reduction respectively.
arXiv Detail & Related papers (2022-12-02T09:11:32Z) - FastCorrect 2: Fast Error Correction on Multiple Candidates for
Automatic Speech Recognition [92.12910821300034]
We propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy.
FastCorrect 2 achieves better performance than the cascaded re-scoring and correction pipeline.
arXiv Detail & Related papers (2021-09-29T13:48:03Z) - FastCorrect: Fast Error Correction with Edit Alignment for Automatic
Speech Recognition [90.34177266618143]
We propose FastCorrect, a novel NAR error correction model based on edit alignment.
FastCorrect speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model.
It outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.
arXiv Detail & Related papers (2021-05-09T05:35:36Z) - Advanced Long-context End-to-end Speech Recognition Using
Context-expanded Transformers [56.56220390953412]
We extend our prior work by introducing the Conformer architecture to further improve the accuracy.
We demonstrate that the extended Transformer provides state-of-the-art end-to-end ASR performance.
arXiv Detail & Related papers (2021-04-19T16:18:00Z) - Non-Autoregressive Transformer ASR with CTC-Enhanced Decoder Input [54.82369261350497]
We propose a CTC-enhanced NAR transformer, which generates target sequence by refining predictions of the CTC module.
Experimental results show that our method outperforms all previous NAR counterparts and achieves 50x faster decoding speed than a strong AR baseline with only 0.0 0.3 absolute CER degradation on Aishell-1 and Aishell-2 datasets.
arXiv Detail & Related papers (2020-10-28T15:00:09Z) - FastLR: Non-Autoregressive Lipreading Model with Integrate-and-Fire [74.04394069262108]
We propose FastLR, a non-autoregressive (NAR) lipreading model which generates all target tokens simultaneously.
FastLR achieves the speedup up to 10.97$times$ compared with state-of-the-art lipreading model.
arXiv Detail & Related papers (2020-08-06T08:28:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.