An Approach to Improve Robustness of NLP Systems against ASR Errors
- URL: http://arxiv.org/abs/2103.13610v1
- Date: Thu, 25 Mar 2021 05:15:43 GMT
- Title: An Approach to Improve Robustness of NLP Systems against ASR Errors
- Authors: Tong Cui, Jinghui Xiao, Liangyou Li, Xin Jiang, Qun Liu
- Abstract summary: Speech-enabled systems typically first convert audio to text through an automatic speech recognition model and then feed the text to downstream natural language processing modules.
The errors of the ASR system can seriously downgrade the performance of the NLP modules.
Previous work has shown it is effective to employ data augmentation methods to solve this problem by injecting ASR noise during the training process.
- Score: 39.57253455717825
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Speech-enabled systems typically first convert audio to text through an
automatic speech recognition (ASR) model and then feed the text to downstream
natural language processing (NLP) modules. The errors of the ASR system can
seriously downgrade the performance of the NLP modules. Therefore, it is
essential to make them robust to the ASR errors. Previous work has shown it is
effective to employ data augmentation methods to solve this problem by
injecting ASR noise during the training process. In this paper, we utilize the
prevalent pre-trained language model to generate training samples with
ASR-plausible noise. Compare to the previous methods, our approach generates
ASR noise that better fits the real-world error distribution. Experimental
results on spoken language translation(SLT) and spoken language understanding
(SLU) show that our approach effectively improves the system robustness against
the ASR errors and achieves state-of-the-art results on both tasks.
Related papers
- Towards interfacing large language models with ASR systems using confidence measures and prompting [54.39667883394458]
This work investigates post-hoc correction of ASR transcripts with large language models (LLMs)
To avoid introducing errors into likely accurate transcripts, we propose a range of confidence-based filtering methods.
Our results indicate that this can improve the performance of less competitive ASR systems.
arXiv Detail & Related papers (2024-07-31T08:00:41Z) - Improving Robustness of Neural Inverse Text Normalization via
Data-Augmentation, Semi-Supervised Learning, and Post-Aligning Method [4.343606621506086]
Inverse text normalization (ITN) is crucial for converting spoken-form into written-form, especially in the context of automatic speech recognition (ASR)
We propose a direct training approach that utilizes ASR-generated written or spoken text, with pairs augmented through ASR linguistic context emulation and a semi-supervised learning method enhanced by a large language model.
Our proposed methods remarkably improved ITN performance in various ASR scenarios.
arXiv Detail & Related papers (2023-09-12T06:05:57Z) - Modality Confidence Aware Training for Robust End-to-End Spoken Language
Understanding [18.616202196061966]
End-to-end (E2E) spoken language understanding (SLU) systems that generate a semantic parse from speech have become more promising recently.
This approach uses a single model that utilizes audio and text representations from pre-trained speech recognition models (ASR)
We propose a novel E2E SLU system that enhances robustness to ASR errors by fusing audio and text representations based on the estimated modality confidence of ASR hypotheses.
arXiv Detail & Related papers (2023-07-22T17:47:31Z) - Deliberation Model for On-Device Spoken Language Understanding [69.5587671262691]
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU)
We show that our approach can significantly reduce the degradation when moving from natural speech to synthetic speech training.
arXiv Detail & Related papers (2022-04-04T23:48:01Z) - Improving Noise Robustness of Contrastive Speech Representation Learning
with Speech Reconstruction [109.44933866397123]
Noise robustness is essential for deploying automatic speech recognition systems in real-world environments.
We employ a noise-robust representation learned by a refined self-supervised framework for noisy speech recognition.
We achieve comparable performance to the best supervised approach reported with only 16% of labeled data.
arXiv Detail & Related papers (2021-10-28T20:39:02Z) - Improving Distinction between ASR Errors and Speech Disfluencies with
Feature Space Interpolation [0.0]
Fine-tuning pretrained language models (LMs) is a popular approach to automatic speech recognition (ASR) error detection during post-processing.
This paper proposes a scheme to improve existing LM-based ASR error detection systems.
arXiv Detail & Related papers (2021-08-04T02:11:37Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.