Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding
- URL: http://arxiv.org/abs/2104.06393v1
- Date: Tue, 13 Apr 2021 17:54:33 GMT
- Title: Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding
- Authors: Di Wu, Yiren Chen, Liang Ding, Dacheng Tao
- Abstract summary: Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
- Score: 76.89426311082927
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Spoken language understanding (SLU) system usually consists of various
pipeline components, where each component heavily relies on the results of its
upstream ones. For example, Intent detection (ID), and slot filling (SF)
require its upstream automatic speech recognition (ASR) to transform the voice
into text. In this case, the upstream perturbations, e.g. ASR errors,
environmental noise and careless user speaking, will propagate to the ID and SF
models, thus deteriorating the system performance. Therefore, the
well-performing SF and ID models are expected to be noise resistant to some
extent. However, existing models are trained on clean data, which causes a
\textit{gap between clean data training and real-world inference.} To bridge
the gap, we propose a method from the perspective of domain adaptation, by
which both high- and low-quality samples are embedding into similar vector
space. Meanwhile, we design a denoising generation model to reduce the impact
of the low-quality samples. Experiments on the widely-used dataset, i.e. Snips,
and large scale in-house dataset (10 million training examples) demonstrate
that this method not only outperforms the baseline models on real-world (noisy)
corpus but also enhances the robustness, that is, it produces high-quality
results under a noisy environment. The source code will be released.
Related papers
- DiffATR: Diffusion-based Generative Modeling for Audio-Text Retrieval [49.076590578101985]
We present a diffusion-based ATR framework (DiffATR) that generates joint distribution from noise.
Experiments on the AudioCaps and Clotho datasets with superior performances, verify the effectiveness of our approach.
arXiv Detail & Related papers (2024-09-16T06:33:26Z) - Describe Where You Are: Improving Noise-Robustness for Speech Emotion Recognition with Text Description of the Environment [21.123477804401116]
Speech emotion recognition (SER) systems often struggle in real-world environments, where ambient noise severely degrades their performance.
This paper explores a novel approach that exploits prior knowledge of testing environments to maximize SER performance under noisy conditions.
arXiv Detail & Related papers (2024-07-25T02:30:40Z) - Language Rectified Flow: Advancing Diffusion Language Generation with Probabilistic Flows [53.31856123113228]
This paper proposes Language Rectified Flow (ours)
Our method is based on the reformulation of the standard probabilistic flow models.
Experiments and ablation studies demonstrate that our method can be general, effective, and beneficial for many NLP tasks.
arXiv Detail & Related papers (2024-03-25T17:58:22Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Influence Scores at Scale for Efficient Language Data Sampling [3.072340427031969]
"influence scores" are used to identify important subsets of data.
In this paper, we explore the applicability of influence scores in language classification tasks.
arXiv Detail & Related papers (2023-11-27T20:19:22Z) - DiffSED: Sound Event Detection with Denoising Diffusion [70.18051526555512]
We reformulate the SED problem by taking a generative learning perspective.
Specifically, we aim to generate sound temporal boundaries from noisy proposals in a denoising diffusion process.
During training, our model learns to reverse the noising process by converting noisy latent queries to the groundtruth versions.
arXiv Detail & Related papers (2023-08-14T17:29:41Z) - An Investigation of Noise in Morphological Inflection [21.411766936034]
We investigate the types of noise encountered within a pipeline for truly unsupervised morphological paradigm completion.
We compare the effect of different types of noise on multiple state-of-the-art inflection models.
We propose a novel character-level masked language modeling (CMLM) pretraining objective and explore its impact on the models' resistance to noise.
arXiv Detail & Related papers (2023-05-26T02:14:34Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.