Unsupervised domain adaptation for speech recognition with unsupervised
error correction
- URL: http://arxiv.org/abs/2209.12043v1
- Date: Sat, 24 Sep 2022 16:05:23 GMT
- Title: Unsupervised domain adaptation for speech recognition with unsupervised
error correction
- Authors: Long Mai, Julie Carson-Berndsen
- Abstract summary: We propose an unsupervised error correction method for unsupervised ASR domain adaption.
Our approach requires only unlabeled data of the target domains in which a pseudo-labeling technique is applied to generate correction training samples.
Experiment results show that our method obtains a significant word error rate (WER) reduction over non-adapted ASR systems.
- Score: 20.465220855548292
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The transcription quality of automatic speech recognition (ASR) systems
degrades significantly when transcribing audios coming from unseen domains. We
propose an unsupervised error correction method for unsupervised ASR domain
adaption, aiming to recover transcription errors caused by domain mismatch.
Unlike existing correction methods that rely on transcribed audios for
training, our approach requires only unlabeled data of the target domains in
which a pseudo-labeling technique is applied to generate correction training
samples. To reduce over-fitting to the pseudo data, we also propose an
encoder-decoder correction model that can take into account additional
information such as dialogue context and acoustic features. Experiment results
show that our method obtains a significant word error rate (WER) reduction over
non-adapted ASR systems. The correction model can also be applied on top of
other adaptation approaches to bring an additional improvement of 10%
relatively.
Related papers
- Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition [52.624909026294105]
We propose a non-autoregressive speech error correction method.
A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses.
The proposed system reduces the error rate by 21% compared with the ASR model.
arXiv Detail & Related papers (2024-06-29T17:56:28Z) - Tag and correct: high precision post-editing approach to correction of speech recognition errors [0.0]
It consists of using a neural sequence tagger that learns how to correct an ASR (Automatic Speech Recognition) hypothesis word by word and a corrector module that applies corrections returned by the tagger.
The proposed solution is applicable to any ASR system, regardless of its architecture, and provides high-precision control over errors being corrected.
arXiv Detail & Related papers (2024-06-11T09:52:33Z) - Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models [84.8919069953397]
Self-TAught Recognizer (STAR) is an unsupervised adaptation framework for speech recognition systems.
We show that STAR achieves an average of 13.5% relative reduction in word error rate across 14 target domains.
STAR exhibits high data efficiency that only requires less than one-hour unlabeled data.
arXiv Detail & Related papers (2024-05-23T04:27:11Z) - Uncertainty-Aware Source-Free Adaptive Image Super-Resolution with Wavelet Augmentation Transformer [60.31021888394358]
Unsupervised Domain Adaptation (UDA) can effectively address domain gap issues in real-world image Super-Resolution (SR)
We propose a SOurce-free Domain Adaptation framework for image SR (SODA-SR) to address this issue, i.e., adapt a source-trained model to a target domain with only unlabeled target data.
arXiv Detail & Related papers (2023-03-31T03:14:44Z) - ASR Error Detection via Audio-Transcript entailment [1.3750624267664155]
We propose an end-to-end approach for ASR error detection using audio-transcript entailment.
The proposed model utilizes an acoustic encoder and a linguistic encoder to model the speech and transcript respectively.
Our proposed model achieves classification error rates (CER) of 26.2% on all transcription errors and 23% on medical errors specifically, leading to improvements upon a strong baseline by 12% and 15.4%, respectively.
arXiv Detail & Related papers (2022-07-22T02:47:15Z) - Error Correction in ASR using Sequence-to-Sequence Models [32.41875780785648]
Post-editing in Automatic Speech Recognition entails automatically correcting common and systematic errors produced by the ASR system.
We propose to use a powerful pre-trained sequence-to-sequence model, BART, to serve as a denoising model.
Experimental results on accented speech data demonstrate that our strategy effectively rectifies a significant number of ASR errors.
arXiv Detail & Related papers (2022-02-02T17:32:59Z) - Hybrid phonetic-neural model for correction in speech recognition
systems [0.0]
We explore using a deep neural network to refine the results of a phonetic correction algorithm applied to a telesales audio database.
The results show the viability of deep learning models together with post-processing correction strategies to reduce errors made by closed ASRs in specific language domains.
arXiv Detail & Related papers (2021-02-12T19:57:16Z) - Selective Pseudo-Labeling with Reinforcement Learning for
Semi-Supervised Domain Adaptation [116.48885692054724]
We propose a reinforcement learning based selective pseudo-labeling method for semi-supervised domain adaptation.
We develop a deep Q-learning model to select both accurate and representative pseudo-labeled instances.
Our proposed method is evaluated on several benchmark datasets for SSDA, and demonstrates superior performance to all the comparison methods.
arXiv Detail & Related papers (2020-12-07T03:37:38Z) - Unsupervised Domain Adaptation for Speech Recognition via Uncertainty
Driven Self-Training [55.824641135682725]
Domain adaptation experiments using WSJ as a source domain and TED-LIUM 3 as well as SWITCHBOARD show that up to 80% of the performance of a system trained on ground-truth data can be recovered.
arXiv Detail & Related papers (2020-11-26T18:51:26Z) - A Self-Refinement Strategy for Noise Reduction in Grammatical Error
Correction [54.569707226277735]
Existing approaches for grammatical error correction (GEC) rely on supervised learning with manually created GEC datasets.
There is a non-negligible amount of "noise" where errors were inappropriately edited or left uncorrected.
We propose a self-refinement method where the key idea is to denoise these datasets by leveraging the prediction consistency of existing models.
arXiv Detail & Related papers (2020-10-07T04:45:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.