ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language
Understanding
- URL: http://arxiv.org/abs/2108.13048v1
- Date: Mon, 30 Aug 2021 08:11:39 GMT
- Title: ASR-GLUE: A New Multi-task Benchmark for ASR-Robust Natural Language
Understanding
- Authors: Lingyun Feng, Jianwei Yu, Deng Cai, Songxiang Liu, Haitao Zheng, Yan
Wang
- Abstract summary: The robustness of natural language understanding systems to errors introduced by automatic speech recognition (ASR) is under-examined.
We propose ASR-GLUE benchmark, a new collection of 6 different NLU tasks for evaluating the performance of models under ASR error.
- Score: 42.80343041535763
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Language understanding in speech-based systems have attracted much attention
in recent years with the growing demand for voice interface applications.
However, the robustness of natural language understanding (NLU) systems to
errors introduced by automatic speech recognition (ASR) is under-examined. %To
facilitate the research on ASR-robust general language understanding, In this
paper, we propose ASR-GLUE benchmark, a new collection of 6 different NLU tasks
for evaluating the performance of models under ASR error across 3 different
levels of background noise and 6 speakers with various voice characteristics.
Based on the proposed benchmark, we systematically investigate the effect of
ASR error on NLU tasks in terms of noise intensity, error type and speaker
variants. We further purpose two ways, correction-based method and data
augmentation-based method to improve robustness of the NLU systems. Extensive
experimental results and analysises show that the proposed methods are
effective to some extent, but still far from human performance, demonstrating
that NLU under ASR error is still very challenging and requires further
research.
Related papers
- Interventional Speech Noise Injection for ASR Generalizable Spoken Language Understanding [26.98755758066905]
We train SLU models to withstand ASR errors by exposing them to noises commonly observed in ASR systems.
We propose a novel and less biased augmentation method of introducing the noises that are plausible to any ASR system.
arXiv Detail & Related papers (2024-10-21T03:13:22Z) - Towards ASR Robust Spoken Language Understanding Through In-Context
Learning With Word Confusion Networks [68.79880423713597]
We introduce a method that utilizes the ASR system's lattice output instead of relying solely on the top hypothesis.
Our in-context learning experiments, covering spoken question answering and intent classification, underline the LLM's resilience to noisy speech transcripts.
arXiv Detail & Related papers (2024-01-05T17:58:10Z) - Multimodal Audio-textual Architecture for Robust Spoken Language
Understanding [18.702076738332867]
multimodal language understanding (MLU) module is proposed to mitigate SLU performance degradation caused by errors in the ASR transcript.
Our model is evaluated on five tasks from three SLU datasets and robustness is tested using ASR transcripts from three ASR engines.
Results show that the proposed approach effectively mitigates the ASR error propagation problem, surpassing the PLM models' performance across all datasets for the academic ASR engine.
arXiv Detail & Related papers (2023-06-12T01:55:53Z) - Deliberation Model for On-Device Spoken Language Understanding [69.5587671262691]
We propose a novel deliberation-based approach to end-to-end (E2E) spoken language understanding (SLU)
We show that our approach can significantly reduce the degradation when moving from natural speech to synthetic speech training.
arXiv Detail & Related papers (2022-04-04T23:48:01Z) - Cross-sentence Neural Language Models for Conversational Speech
Recognition [17.317583079824423]
We propose an effective cross-sentence neural LM approach that reranks the ASR N-best hypotheses of an upcoming sentence.
We also explore to extract task-specific global topical information of the cross-sentence history.
arXiv Detail & Related papers (2021-06-13T05:30:16Z) - An Approach to Improve Robustness of NLP Systems against ASR Errors [39.57253455717825]
Speech-enabled systems typically first convert audio to text through an automatic speech recognition model and then feed the text to downstream natural language processing modules.
The errors of the ASR system can seriously downgrade the performance of the NLP modules.
Previous work has shown it is effective to employ data augmentation methods to solve this problem by injecting ASR noise during the training process.
arXiv Detail & Related papers (2021-03-25T05:15:43Z) - Directional ASR: A New Paradigm for E2E Multi-Speaker Speech Recognition
with Source Localization [73.62550438861942]
This paper proposes a new paradigm for handling far-field multi-speaker data in an end-to-end neural network manner, called directional automatic speech recognition (D-ASR)
In D-ASR, the azimuth angle of the sources with respect to the microphone array is defined as a latent variable. This angle controls the quality of separation, which in turn determines the ASR performance.
arXiv Detail & Related papers (2020-10-30T20:26:28Z) - Improving Readability for Automatic Speech Recognition Transcription [50.86019112545596]
We propose a novel NLP task called ASR post-processing for readability (APR)
APR aims to transform the noisy ASR output into a readable text for humans and downstream tasks while maintaining the semantic meaning of the speaker.
We compare fine-tuned models based on several open-sourced and adapted pre-trained models with the traditional pipeline method.
arXiv Detail & Related papers (2020-04-09T09:26:42Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.