Towards Robust and Generalizable Training: An Empirical Study of Noisy
Slot Filling for Input Perturbations
- URL: http://arxiv.org/abs/2310.03518v1
- Date: Thu, 5 Oct 2023 12:59:57 GMT
- Title: Towards Robust and Generalizable Training: An Empirical Study of Noisy
Slot Filling for Input Perturbations
- Authors: Jiachi Liu, Liwen Wang, Guanting Dong, Xiaoshuai Song, Zechen Wang,
Zhengyang Wang, Shanglin Lei, Jinzheng Zhao, Keqing He, Bo Xiao, Weiran Xu
- Abstract summary: We introduce a noise robustness evaluation dataset named Noise-SF for slot filling task.
The proposed dataset contains five types of human-annotated noise.
We find that baseline models have poor performance in robustness evaluation.
- Score: 38.766702041991046
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In real dialogue scenarios, as there are unknown input noises in the
utterances, existing supervised slot filling models often perform poorly in
practical applications. Even though there are some studies on noise-robust
models, these works are only evaluated on rule-based synthetic datasets, which
is limiting, making it difficult to promote the research of noise-robust
methods. In this paper, we introduce a noise robustness evaluation dataset
named Noise-SF for slot filling task. The proposed dataset contains five types
of human-annotated noise, and all those noises are exactly existed in real
extensive robust-training methods of slot filling into the proposed framework.
By conducting exhaustive empirical evaluation experiments on Noise-SF, we find
that baseline models have poor performance in robustness evaluation, and the
proposed framework can effectively improve the robustness of models. Based on
the empirical experimental results, we make some forward-looking suggestions to
fuel the research in this direction. Our dataset Noise-SF will be released at
https://github.com/dongguanting/Noise-SF.
Related papers
- Dataset Distillers Are Good Label Denoisers In the Wild [16.626153947696743]
We propose a novel approach that leverages dataset distillation for noise removal.
This method avoids the feedback loop common in existing techniques and enhances training efficiency.
We rigorously evaluate three representative dataset distillation methods (DATM, DANCE, and RCIG) under various noise conditions.
arXiv Detail & Related papers (2024-11-18T06:26:41Z) - Towards Robust Transcription: Exploring Noise Injection Strategies for Training Data Augmentation [55.752737615873464]
This study investigates the impact of white noise at various Signal-to-Noise Ratio (SNR) levels on state-of-the-art APT models.
We hope this research provides valuable insights as preliminary work toward developing transcription models that maintain consistent performance across a range of acoustic conditions.
arXiv Detail & Related papers (2024-10-18T02:31:36Z) - Effective Noise-aware Data Simulation for Domain-adaptive Speech Enhancement Leveraging Dynamic Stochastic Perturbation [25.410770364140856]
Cross-domain speech enhancement (SE) is often faced with severe challenges due to the scarcity of noise and background information in an unseen target domain.
This study puts forward a novel data simulation method to address this issue, leveraging noise-extractive techniques and generative adversarial networks (GANs)
We introduce the notion of dynamic perturbation, which can inject controlled perturbations into the noise embeddings during inference.
arXiv Detail & Related papers (2024-09-03T02:29:01Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Noisy Pair Corrector for Dense Retrieval [59.312376423104055]
We propose a novel approach called Noisy Pair Corrector (NPC)
NPC consists of a detection module and a correction module.
We conduct experiments on text-retrieval benchmarks Natural Question and TriviaQA, code-search benchmarks StaQC and SO-DS.
arXiv Detail & Related papers (2023-11-07T08:27:14Z) - Realistic Noise Synthesis with Diffusion Models [68.48859665320828]
Deep image denoising models often rely on large amount of training data for the high quality performance.
We propose a novel method that synthesizes realistic noise using diffusion models, namely Realistic Noise Synthesize Diffusor (RNSD)
RNSD can incorporate guided multiscale content, such as more realistic noise with spatial correlations can be generated at multiple frequencies.
arXiv Detail & Related papers (2023-05-23T12:56:01Z) - Confidence-based Reliable Learning under Dual Noises [46.45663546457154]
Deep neural networks (DNNs) have achieved remarkable success in a variety of computer vision tasks.
Yet, the data collected from the open world are unavoidably polluted by noise, which may significantly undermine the efficacy of the learned models.
Various attempts have been made to reliably train DNNs under data noise, but they separately account for either the noise existing in the labels or that existing in the images.
This work provides a first, unified framework for reliable learning under the joint (image, label)-noise.
arXiv Detail & Related papers (2023-02-10T07:50:34Z) - Improving the Robustness of Summarization Models by Detecting and
Removing Input Noise [50.27105057899601]
We present a large empirical study quantifying the sometimes severe loss in performance from different types of input noise for a range of datasets and model sizes.
We propose a light-weight method for detecting and removing such noise in the input during model inference without requiring any training, auxiliary models, or even prior knowledge of the type of noise.
arXiv Detail & Related papers (2022-12-20T00:33:11Z) - Analysing the Noise Model Error for Realistic Noisy Label Data [14.766574408868806]
We study the quality of estimated noise models from the theoretical side by deriving the expected error of the noise model.
We also publish NoisyNER, a new noisy label dataset from the NLP domain.
arXiv Detail & Related papers (2021-01-24T17:45:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.