FastCorrect 2: Fast Error Correction on Multiple Candidates for
Automatic Speech Recognition
- URL: http://arxiv.org/abs/2109.14420v2
- Date: Fri, 1 Oct 2021 06:57:39 GMT
- Title: FastCorrect 2: Fast Error Correction on Multiple Candidates for
Automatic Speech Recognition
- Authors: Yichong Leng, Xu Tan, Rui Wang, Linchen Zhu, Jin Xu, Linquan Liu, Tao
Qin, Xiang-Yang Li, Edward Lin, Tie-Yan Liu
- Abstract summary: We propose FastCorrect 2, an error correction model that takes multiple ASR candidates as input for better correction accuracy.
FastCorrect 2 achieves better performance than the cascaded re-scoring and correction pipeline.
- Score: 92.12910821300034
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Error correction is widely used in automatic speech recognition (ASR) to
post-process the generated sentence, and can further reduce the word error rate
(WER). Although multiple candidates are generated by an ASR system through beam
search, current error correction approaches can only correct one sentence at a
time, failing to leverage the voting effect from multiple candidates to better
detect and correct error tokens. In this work, we propose FastCorrect 2, an
error correction model that takes multiple ASR candidates as input for better
correction accuracy. FastCorrect 2 adopts non-autoregressive generation for
fast inference, which consists of an encoder that processes multiple source
sentences and a decoder that generates the target sentence in parallel from the
adjusted source sentence, where the adjustment is based on the predicted
duration of each source token. However, there are some issues when handling
multiple source sentences. First, it is non-trivial to leverage the voting
effect from multiple source sentences since they usually vary in length. Thus,
we propose a novel alignment algorithm to maximize the degree of token
alignment among multiple sentences in terms of token and pronunciation
similarity. Second, the decoder can only take one adjusted source sentence as
input, while there are multiple source sentences. Thus, we develop a candidate
predictor to detect the most suitable candidate for the decoder. Experiments on
our inhouse dataset and AISHELL-1 show that FastCorrect 2 can further reduce
the WER over the previous correction model with single candidate by 3.2% and
2.6%, demonstrating the effectiveness of leveraging multiple candidates in ASR
error correction. FastCorrect 2 achieves better performance than the cascaded
re-scoring and correction pipeline and can serve as a unified post-processing
module for ASR.
Related papers
- Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition [52.624909026294105]
We propose a non-autoregressive speech error correction method.
A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses.
The proposed system reduces the error rate by 21% compared with the ASR model.
arXiv Detail & Related papers (2024-06-29T17:56:28Z) - Tag and correct: high precision post-editing approach to correction of speech recognition errors [0.0]
It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis word by word and a corrector module that applies corrections returned by the tagger.
The proposed solution is applicable to any ASR system, regardless of its architecture, and provides high-precision control over errors being corrected.
arXiv Detail & Related papers (2024-06-11T09:52:33Z) - DiaCorrect: Error Correction Back-end For Speaker Diarization [9.311650017389262]
We propose an error correction framework, named DiaCorrect, to refine the output of a diarization system.
Our model consists of two parallel convolutional encoders and a transform-based decoder.
arXiv Detail & Related papers (2023-09-15T13:08:12Z) - SpellMapper: A non-autoregressive neural spellchecker for ASR
customization with candidate retrieval based on n-gram mappings [76.87664008338317]
Contextual spelling correction models are an alternative to shallow fusion to improve automatic speech recognition.
We propose a novel algorithm for candidate retrieval based on misspelled n-gram mappings.
Experiments on Spoken Wikipedia show 21.4% word error rate improvement compared to a baseline ASR system.
arXiv Detail & Related papers (2023-06-04T10:00:12Z) - SoftCorrect: Error Correction with Soft Detection for Automatic Speech
Recognition [116.31926128970585]
We propose SoftCorrect with a soft error detection mechanism to avoid the limitations of both explicit and implicit error detection.
Compared with implicit error detection with CTC loss, SoftCorrect provides explicit signal about which words are incorrect.
Experiments on AISHELL-1 and Aidatatang datasets show that SoftCorrect achieves 26.1% and 9.4% CER reduction respectively.
arXiv Detail & Related papers (2022-12-02T09:11:32Z) - ASR Error Correction with Constrained Decoding on Operation Prediction [8.701142327932484]
We propose an ASR error correction method utilizing the predictions of correction operations.
Experiments on three public datasets demonstrate the effectiveness of the proposed approach in reducing the latency of the decoding process.
arXiv Detail & Related papers (2022-08-09T09:59:30Z) - FastCorrect: Fast Error Correction with Edit Alignment for Automatic
Speech Recognition [90.34177266618143]
We propose FastCorrect, a novel NAR error correction model based on edit alignment.
FastCorrect speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model.
It outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.
arXiv Detail & Related papers (2021-05-09T05:35:36Z) - Improving the Efficiency of Grammatical Error Correction with Erroneous
Span Detection and Correction [106.63733511672721]
We propose a novel language-independent approach to improve the efficiency for Grammatical Error Correction (GEC) by dividing the task into two subtasks: Erroneous Span Detection ( ESD) and Erroneous Span Correction (ESC)
ESD identifies grammatically incorrect text spans with an efficient sequence tagging model. ESC leverages a seq2seq model to take the sentence with annotated erroneous spans as input and only outputs the corrected text for these spans.
Experiments show our approach performs comparably to conventional seq2seq approaches in both English and Chinese GEC benchmarks with less than 50% time cost for inference.
arXiv Detail & Related papers (2020-10-07T08:29:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.