Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust
Intent Detection
- URL: http://arxiv.org/abs/2205.11008v1
- Date: Mon, 23 May 2022 02:54:11 GMT
- Title: Calibrate and Refine! A Novel and Agile Framework for ASR-error Robust
Intent Detection
- Authors: Peilin Zhou, Dading Chong, Helin Wang, Qingcheng Zeng
- Abstract summary: We propose a novel framework called CR-ID for ASR error robust intent detection with two plug-and-play modules.
Experimental results on SNIPS dataset show that, our proposed CR-ID framework achieves competitive performance.
- Score: 8.842878491315124
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: The past ten years have witnessed the rapid development of text-based intent
detection, whose benchmark performances have already been taken to a remarkable
level by deep learning techniques. However, automatic speech recognition (ASR)
errors are inevitable in real-world applications due to the environment noise,
unique speech patterns and etc, leading to sharp performance drop in
state-of-the-art text-based intent detection models. Essentially, this
phenomenon is caused by the semantic drift brought by ASR errors and most
existing works tend to focus on designing new model structures to reduce its
impact, which is at the expense of versatility and flexibility. Different from
previous one-piece model, in this paper, we propose a novel and agile framework
called CR-ID for ASR error robust intent detection with two plug-and-play
modules, namely semantic drift calibration module (SDCM) and phonemic
refinement module (PRM), which are both model-agnostic and thus could be easily
integrated to any existing intent detection models without modifying their
structures. Experimental results on SNIPS dataset show that, our proposed CR-ID
framework achieves competitive performance and outperform all the baseline
methods on ASR outputs, which verifies that CR-ID can effectively alleviate the
semantic drift caused by ASR errors.
Related papers
- Transferable Adversarial Attacks against ASR [43.766547483367795]
We study the vulnerability of practical black-box attacks in cutting-edge automatic speech recognition models.
We propose a speech-aware gradient optimization approach (SAGO) for ASR, which forces mistranscription with minimal impact on human imperceptibility.
Our comprehensive experimental results reveal performance enhancements compared to baseline approaches across five models on two databases.
arXiv Detail & Related papers (2024-11-14T06:32:31Z) - Failing Forward: Improving Generative Error Correction for ASR with Synthetic Data and Retrieval Augmentation [73.9145653659403]
We show that Generative Error Correction models struggle to generalize beyond the specific types of errors encountered during training.
We propose DARAG, a novel approach designed to improve GEC for ASR in in-domain (ID) and OOD scenarios.
Our approach is simple, scalable, and both domain- and language-agnostic.
arXiv Detail & Related papers (2024-10-17T04:00:29Z) - Quantifying the Role of Textual Predictability in Automatic Speech Recognition [13.306122574236232]
A long-standing question in automatic speech recognition research is how to attribute errors to the ability of a model to model the acoustics.
We validate a novel approach which models error rates as a function of relative textual predictability.
We show how this approach can be used straightforwardly in diagnosing and improving ASR.
arXiv Detail & Related papers (2024-07-23T14:47:25Z) - Self-Taught Recognizer: Toward Unsupervised Adaptation for Speech Foundation Models [84.8919069953397]
Self-TAught Recognizer (STAR) is an unsupervised adaptation framework for speech recognition systems.
We show that STAR achieves an average of 13.5% relative reduction in word error rate across 14 target domains.
STAR exhibits high data efficiency that only requires less than one-hour unlabeled data.
arXiv Detail & Related papers (2024-05-23T04:27:11Z) - Boosting Chinese ASR Error Correction with Dynamic Error Scaling
Mechanism [27.09416337926635]
Current mainstream models often struggle with effectively utilizing word-level features and phonetic information.
This paper introduces a novel approach that incorporates a dynamic error scaling mechanism to detect and correct phonetically erroneous text.
arXiv Detail & Related papers (2023-08-07T09:19:59Z) - Benchmarking the Robustness of LiDAR Semantic Segmentation Models [78.6597530416523]
In this paper, we aim to comprehensively analyze the robustness of LiDAR semantic segmentation models under various corruptions.
We propose a new benchmark called SemanticKITTI-C, which features 16 out-of-domain LiDAR corruptions in three groups, namely adverse weather, measurement noise and cross-device discrepancy.
We design a robust LiDAR segmentation model (RLSeg) which greatly boosts the robustness with simple but effective modifications.
arXiv Detail & Related papers (2023-01-03T06:47:31Z) - Toward Certified Robustness Against Real-World Distribution Shifts [65.66374339500025]
We train a generative model to learn perturbations from data and define specifications with respect to the output of the learned model.
A unique challenge arising from this setting is that existing verifiers cannot tightly approximate sigmoid activations.
We propose a general meta-algorithm for handling sigmoid activations which leverages classical notions of counter-example-guided abstraction refinement.
arXiv Detail & Related papers (2022-06-08T04:09:13Z) - Enhancing the Generalization for Intent Classification and Out-of-Domain
Detection in SLU [70.44344060176952]
Intent classification is a major task in spoken language understanding (SLU)
Recent works have shown that using extra data and labels can improve the OOD detection performance.
This paper proposes to train a model with only IND data while supporting both IND intent classification and OOD detection.
arXiv Detail & Related papers (2021-06-28T08:27:38Z) - Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative
Adversarial Networks [10.723935272906461]
Adversarial training of end-to-end (E2E) ASR systems using generative adversarial networks (GAN) has recently been explored.
We introduce a novel framework for fine-tuning a pre-trained ASR model using the GAN objective.
Our proposed approach outperforms baselines and conventional GAN-based adversarial models.
arXiv Detail & Related papers (2021-03-10T17:40:48Z) - Joint Contextual Modeling for ASR Correction and Language Understanding [60.230013453699975]
We propose multi-task neural approaches to perform contextual language correction on ASR outputs jointly with language understanding (LU)
We show that the error rates of off the shelf ASR and following LU systems can be reduced significantly by 14% relative with joint models trained using small amounts of in-domain data.
arXiv Detail & Related papers (2020-01-28T22:09:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.