UCorrect: An Unsupervised Framework for Automatic Speech Recognition
Error Correction
- URL: http://arxiv.org/abs/2401.05689v1
- Date: Thu, 11 Jan 2024 06:30:07 GMT
- Title: UCorrect: An Unsupervised Framework for Automatic Speech Recognition
Error Correction
- Authors: Jiaxin Guo, Minghan Wang, Xiaosong Qiao, Daimeng Wei, Hengchao Shang,
Zongyao Li, Zhengzhe Yu, Yinglu Li, Chang Su, Min Zhang, Shimin Tao, Hao Yang
- Abstract summary: We propose UCorrect, an unsupervised Detector-Generator-Selector framework for ASR Error Correction.
Experiments on the public AISHELL-1 dataset and WenetSpeech dataset show the effectiveness of UCorrect.
- Score: 18.97378605403447
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Error correction techniques have been used to refine the output sentences
from automatic speech recognition (ASR) models and achieve a lower word error
rate (WER). Previous works usually adopt end-to-end models and has strong
dependency on Pseudo Paired Data and Original Paired Data. But when only
pre-training on Pseudo Paired Data, previous models have negative effect on
correction. While fine-tuning on Original Paired Data, the source side data
must be transcribed by a well-trained ASR model, which takes a lot of time and
not universal. In this paper, we propose UCorrect, an unsupervised
Detector-Generator-Selector framework for ASR Error Correction. UCorrect has no
dependency on the training data mentioned before. The whole procedure is first
to detect whether the character is erroneous, then to generate some candidate
characters and finally to select the most confident one to replace the error
character. Experiments on the public AISHELL-1 dataset and WenetSpeech dataset
show the effectiveness of UCorrect for ASR error correction: 1) it achieves
significant WER reduction, achieves 6.83\% even without fine-tuning and 14.29\%
after fine-tuning; 2) it outperforms the popular NAR correction models by a
large margin with a competitive low latency; and 3) it is an universal method,
as it reduces all WERs of the ASR model with different decoding strategies and
reduces all WERs of ASR models trained on different scale datasets.
Related papers
- Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training.
We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios.
Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z) - Training Language Models to Self-Correct via Reinforcement Learning [98.35197671595343]
Self-correction has been found to be largely ineffective in modern large language models (LLMs)
We develop a multi-turn online reinforcement learning approach, SCoRe, that significantly improves an LLM's self-correction ability using entirely self-generated data.
We find that SCoRe achieves state-of-the-art self-correction performance, improving the base models' self-correction by 15.6% and 9.1% respectively on MATH and HumanEval.
arXiv Detail & Related papers (2024-09-19T17:16:21Z) - Tag and correct: high precision post-editing approach to correction of speech recognition errors [0.0]
It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis word by word and a corrector module that applies corrections returned by the tagger.
The proposed solution is applicable to any ASR system, regardless of its architecture, and provides high-precision control over errors being corrected.
arXiv Detail & Related papers (2024-06-11T09:52:33Z) - Parameter-tuning-free data entry error unlearning with adaptive
selective synaptic dampening [51.34904967046097]
We introduce an extension to the selective synaptic dampening unlearning method that removes the need for parameter tuning.
We demonstrate the performance of this extension, adaptive selective synaptic dampening (ASSD) on various ResNet18 and Vision Transformer unlearning tasks.
The application of this approach is particularly compelling in industrial settings, such as supply chain management.
arXiv Detail & Related papers (2024-02-06T14:04:31Z) - Thutmose Tagger: Single-pass neural model for Inverse Text Normalization [76.87664008338317]
Inverse text normalization (ITN) is an essential post-processing step in automatic speech recognition.
We present a dataset preparation method based on the granular alignment of ITN examples.
One-to-one correspondence between tags and input words improves the interpretability of the model's predictions.
arXiv Detail & Related papers (2022-07-29T20:39:02Z) - Multiple-hypothesis RNN-T Loss for Unsupervised Fine-tuning and
Self-training of Neural Transducer [20.8850874806462]
This paper proposes a new approach to perform unsupervised fine-tuning and self-training using unlabeled speech data.
For the fine-tuning task, ASR models are trained using supervised data from Wall Street Journal (WSJ), Aurora-4 along with CHiME-4 real noisy data as unlabeled data.
For the self-training task, ASR models are trained using supervised data from Wall Street Journal (WSJ), Aurora-4 along with CHiME-4 real noisy data as unlabeled data.
arXiv Detail & Related papers (2022-07-29T15:14:03Z) - READ: Aggregating Reconstruction Error into Out-of-distribution
Detection [5.069442437365223]
Deep neural networks are known to be overconfident for abnormal data.
We propose READ (Reconstruction Error Aggregated Detector) to unify inconsistencies from classifier and autoencoder.
Our method reduces the average FPR@95TPR by up to 9.8% compared with previous state-of-the-art OOD detection algorithms.
arXiv Detail & Related papers (2022-06-15T11:30:41Z) - Error Correction in ASR using Sequence-to-Sequence Models [32.41875780785648]
Post-editing in Automatic Speech Recognition entails automatically correcting common and systematic errors produced by the ASR system.
We propose to use a powerful pre-trained sequence-to-sequence model, BART, to serve as a denoising model.
Experimental results on accented speech data demonstrate that our strategy effectively rectifies a significant number of ASR errors.
arXiv Detail & Related papers (2022-02-02T17:32:59Z) - FastCorrect: Fast Error Correction with Edit Alignment for Automatic
Speech Recognition [90.34177266618143]
We propose FastCorrect, a novel NAR error correction model based on edit alignment.
FastCorrect speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model.
It outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.
arXiv Detail & Related papers (2021-05-09T05:35:36Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.