A Unified Plug-and-Play Framework for Effective Data Denoising and
Robust Abstention
- URL: http://arxiv.org/abs/2009.12027v1
- Date: Fri, 25 Sep 2020 04:18:08 GMT
- Title: A Unified Plug-and-Play Framework for Effective Data Denoising and
Robust Abstention
- Authors: Krishanu Sarker, Xiulong Yang, Yang Li, Saeid Belkasim and Shihao Ji
- Abstract summary: We propose a unified filtering framework leveraging underlying data density.
Our framework can effectively denoising training data and avoid predicting uncertain test data points.
- Score: 4.200576272300216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of Deep Neural Networks (DNNs) highly depends on data quality.
Moreover, predictive uncertainty makes high performing DNNs risky for
real-world deployment. In this paper, we aim to address these two issues by
proposing a unified filtering framework leveraging underlying data density,
that can effectively denoise training data as well as avoid predicting
uncertain test data points. Our proposed framework leverages underlying data
distribution to differentiate between noise and clean data samples without
requiring any modification to existing DNN architectures or loss functions.
Extensive experiments on multiple image classification datasets and multiple
CNN architectures demonstrate that our simple yet effective framework can
outperform the state-of-the-art techniques in denoising training data and
abstaining uncertain test data.
Related papers
- A Real Benchmark Swell Noise Dataset for Performing Seismic Data Denoising via Deep Learning [34.163242023030016]
This article presents a benchmark dataset composed of synthetic seismic data corrupted with noise extracted from a filtering process implemented on real data.
It is proposed as a benchmark for accelerating the development of new solutions for seismic data denoising.
The results show that DL models are effective at denoising seismic data, but some issues remain to be solved.
arXiv Detail & Related papers (2024-10-02T13:06:18Z) - An Embedding is Worth a Thousand Noisy Labels [0.11999555634662634]
We propose WANN, a weighted Adaptive Nearest Neighbor approach to address label noise.
We show WANN outperforms reference methods on diverse datasets of varying size and under various noise types and severities.
Our approach, emphasizing efficiency and explainability, emerges as a simple, robust solution to overcome the inherent limitations of deep neural network training.
arXiv Detail & Related papers (2024-08-26T15:32:31Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Confidence-based Reliable Learning under Dual Noises [46.45663546457154]
Deep neural networks (DNNs) have achieved remarkable success in a variety of computer vision tasks.
Yet, the data collected from the open world are unavoidably polluted by noise, which may significantly undermine the efficacy of the learned models.
Various attempts have been made to reliably train DNNs under data noise, but they separately account for either the noise existing in the labels or that existing in the images.
This work provides a first, unified framework for reliable learning under the joint (image, label)-noise.
arXiv Detail & Related papers (2023-02-10T07:50:34Z) - On-the-fly Denoising for Data Augmentation in Natural Language
Understanding [101.46848743193358]
We propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data.
Our method can be applied to general augmentation techniques and consistently improve the performance on both text classification and question-answering tasks.
arXiv Detail & Related papers (2022-12-20T18:58:33Z) - Adaptive Adversarial Training to Improve Adversarial Robustness of DNNs
for Medical Image Segmentation and Detection [2.2977141788872366]
It is known that Deep Neural Networks (DNNs) are vulnerable to adversarial attacks.
Standard adversarial training (SAT) method has a severe issue that limits its practical use.
We show that our AMAT method outperforms the SAT method in adversarial robustness on noisy data and prediction accuracy on clean data.
arXiv Detail & Related papers (2022-06-02T20:17:53Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - DeepRepair: Style-Guided Repairing for DNNs in the Real-world
Operational Environment [27.316150020006916]
We propose a style-guided data augmentation for repairing Deep Neural Networks (DNNs) in the operational environment.
We propose a style transfer method to learn and introduce the unknown failure patterns within the failure data into the training data via data augmentation.
arXiv Detail & Related papers (2020-11-19T15:09:44Z) - Temporal Calibrated Regularization for Robust Noisy Label Learning [60.90967240168525]
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets.
However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality.
We propose a Temporal Calibrated Regularization (TCR) in which we utilize the original labels and the predictions in the previous epoch together.
arXiv Detail & Related papers (2020-07-01T04:48:49Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.