ASR Error Correction and Domain Adaptation Using Machine Translation
- URL: http://arxiv.org/abs/2003.07692v1
- Date: Fri, 13 Mar 2020 20:05:38 GMT
- Title: ASR Error Correction and Domain Adaptation Using Machine Translation
- Authors: Anirudh Mani, Shruti Palaskar, Nimshi Venkat Meripo, Sandeep Konam,
Florian Metze
- Abstract summary: We propose a technique to perform domain adaptation for ASR error correction via machine translation.
We observe absolute improvement in word error rate and 4 point absolute improvement in BLEU score in Google ASR output.
- Score: 32.27379508770736
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Off-the-shelf pre-trained Automatic Speech Recognition (ASR) systems are an
increasingly viable service for companies of any size building speech-based
products. While these ASR systems are trained on large amounts of data, domain
mismatch is still an issue for many such parties that want to use this service
as-is leading to not so optimal results for their task. We propose a simple
technique to perform domain adaptation for ASR error correction via machine
translation. The machine translation model is a strong candidate to learn a
mapping from out-of-domain ASR errors to in-domain terms in the corresponding
reference files. We use two off-the-shelf ASR systems in this work: Google ASR
(commercial) and the ASPIRE model (open-source). We observe 7% absolute
improvement in word error rate and 4 point absolute improvement in BLEU score
in Google ASR output via our proposed method. We also evaluate ASR error
correction via a downstream task of Speaker Diarization that captures speaker
style, syntax, structure and semantic improvements we obtain via ASR
correction.
Related papers
- ASR Error Correction using Large Language Models [4.75940708384553]
Error correction (EC) models play a crucial role in refining Automatic Speech Recognition (ASR) transcriptions.
This work investigates the use of large language models (LLMs) for error correction across diverse scenarios.
arXiv Detail & Related papers (2024-09-14T23:33:38Z) - Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs)
To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods.
Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z) - Error Correction by Paying Attention to Both Acoustic and Confidence References for Automatic Speech Recognition [52.624909026294105]
We propose a non-autoregressive speech error correction method.
A Confidence Module measures the uncertainty of each word of the N-best ASR hypotheses.
The proposed system reduces the error rate by 21% compared with the ASR model.
arXiv Detail & Related papers (2024-06-29T17:56:28Z) - Crossmodal ASR Error Correction with Discrete Speech Units [16.58209270191005]
We propose a post-ASR processing approach for ASR Error Correction (AEC)
We explore pre-training and fine-tuning strategies and uncover an ASR domain discrepancy phenomenon.
We propose the incorporation of discrete speech units to align with and enhance the word embeddings for improving AEC quality.
arXiv Detail & Related papers (2024-05-26T19:58:38Z) - Error Correction in ASR using Sequence-to-Sequence Models [32.41875780785648]
Post-editing in Automatic Speech Recognition entails automatically correcting common and systematic errors produced by the ASR system.
We propose to use a powerful pre-trained sequence-to-sequence model, BART, to serve as a denoising model.
Experimental results on accented speech data demonstrate that our strategy effectively rectifies a significant number of ASR errors.
arXiv Detail & Related papers (2022-02-02T17:32:59Z) - Attention-based Multi-hypothesis Fusion for Speech Summarization [83.04957603852571]
Speech summarization can be achieved by combining automatic speech recognition (ASR) and text summarization (TS)
ASR errors directly affect the quality of the output summary in the cascade approach.
We propose a cascade speech summarization model that is robust to ASR errors and that exploits multiple hypotheses generated by ASR to attenuate the effect of ASR errors on the summary.
arXiv Detail & Related papers (2021-11-16T03:00:29Z) - FastCorrect: Fast Error Correction with Edit Alignment for Automatic
Speech Recognition [90.34177266618143]
We propose FastCorrect, a novel NAR error correction model based on edit alignment.
FastCorrect speeds up the inference by 6-9 times and maintains the accuracy (8-14% WER reduction) compared with the autoregressive correction model.
It outperforms the accuracy of popular NAR models adopted in neural machine translation by a large margin.
arXiv Detail & Related papers (2021-05-09T05:35:36Z) - An Approach to Improve Robustness of NLP Systems against ASR Errors [39.57253455717825]
Speech-enabled systems typically first convert audio to text through an automatic speech recognition model and then feed the text to downstream natural language processing modules.
The errors of the ASR system can seriously downgrade the performance of the NLP modules.
Previous work has shown it is effective to employ data augmentation methods to solve this problem by injecting ASR noise during the training process.
arXiv Detail & Related papers (2021-03-25T05:15:43Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.