A Unified Plug-and-Play Framework for Effective Data Denoising and
Robust Abstention
- URL: http://arxiv.org/abs/2009.12027v1
- Date: Fri, 25 Sep 2020 04:18:08 GMT
- Title: A Unified Plug-and-Play Framework for Effective Data Denoising and
Robust Abstention
- Authors: Krishanu Sarker, Xiulong Yang, Yang Li, Saeid Belkasim and Shihao Ji
- Abstract summary: We propose a unified filtering framework leveraging underlying data density.
Our framework can effectively denoising training data and avoid predicting uncertain test data points.
- Score: 4.200576272300216
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of Deep Neural Networks (DNNs) highly depends on data quality.
Moreover, predictive uncertainty makes high performing DNNs risky for
real-world deployment. In this paper, we aim to address these two issues by
proposing a unified filtering framework leveraging underlying data density,
that can effectively denoise training data as well as avoid predicting
uncertain test data points. Our proposed framework leverages underlying data
distribution to differentiate between noise and clean data samples without
requiring any modification to existing DNN architectures or loss functions.
Extensive experiments on multiple image classification datasets and multiple
CNN architectures demonstrate that our simple yet effective framework can
outperform the state-of-the-art techniques in denoising training data and
abstaining uncertain test data.
Related papers
- Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - SEMRes-DDPM: Residual Network Based Diffusion Modelling Applied to
Imbalanced Data [9.969882349165745]
In the field of data mining and machine learning, commonly used classification models cannot effectively learn in unbalanced data.
Most of the classical oversampling methods are based on the SMOTE technique, which only focuses on the local information of the data.
We propose a novel oversampling method SEMRes-DDPM.
arXiv Detail & Related papers (2024-03-09T14:01:04Z) - Confidence-based Reliable Learning under Dual Noises [46.45663546457154]
Deep neural networks (DNNs) have achieved remarkable success in a variety of computer vision tasks.
Yet, the data collected from the open world are unavoidably polluted by noise, which may significantly undermine the efficacy of the learned models.
Various attempts have been made to reliably train DNNs under data noise, but they separately account for either the noise existing in the labels or that existing in the images.
This work provides a first, unified framework for reliable learning under the joint (image, label)-noise.
arXiv Detail & Related papers (2023-02-10T07:50:34Z) - On-the-fly Denoising for Data Augmentation in Natural Language
Understanding [101.46848743193358]
We propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data.
Our method can be applied to general augmentation techniques and consistently improve the performance on both text classification and question-answering tasks.
arXiv Detail & Related papers (2022-12-20T18:58:33Z) - Adaptive Adversarial Training to Improve Adversarial Robustness of DNNs
for Medical Image Segmentation and Detection [2.2977141788872366]
It is known that Deep Neural Networks (DNNs) are vulnerable to adversarial attacks.
Standard adversarial training (SAT) method has a severe issue that limits its practical use.
We show that our AMAT method outperforms the SAT method in adversarial robustness on noisy data and prediction accuracy on clean data.
arXiv Detail & Related papers (2022-06-02T20:17:53Z) - Bridging the Gap Between Clean Data Training and Real-World Inference
for Spoken Language Understanding [76.89426311082927]
Existing models are trained on clean data, which causes a textitgap between clean data training and real-world inference.
We propose a method from the perspective of domain adaptation, by which both high- and low-quality samples are embedding into similar vector space.
Experiments on the widely-used dataset, Snips, and large scale in-house dataset (10 million training examples) demonstrate that this method not only outperforms the baseline models on real-world (noisy) corpus but also enhances the robustness, that is, it produces high-quality results under a noisy environment.
arXiv Detail & Related papers (2021-04-13T17:54:33Z) - DeepRepair: Style-Guided Repairing for DNNs in the Real-world
Operational Environment [27.316150020006916]
We propose a style-guided data augmentation for repairing Deep Neural Networks (DNNs) in the operational environment.
We propose a style transfer method to learn and introduce the unknown failure patterns within the failure data into the training data via data augmentation.
arXiv Detail & Related papers (2020-11-19T15:09:44Z) - Prediction of Object Geometry from Acoustic Scattering Using
Convolutional Neural Networks [8.067201256886733]
The present work proposes to infer object geometry from scattering features by training convolutional neural networks.
The robustness of our approach in response to data degradation is evaluated by comparing the performance of networks trained using the datasets.
arXiv Detail & Related papers (2020-10-21T00:51:14Z) - Temporal Calibrated Regularization for Robust Noisy Label Learning [60.90967240168525]
Deep neural networks (DNNs) exhibit great success on many tasks with the help of large-scale well annotated datasets.
However, labeling large-scale data can be very costly and error-prone so that it is difficult to guarantee the annotation quality.
We propose a Temporal Calibrated Regularization (TCR) in which we utilize the original labels and the predictions in the previous epoch together.
arXiv Detail & Related papers (2020-07-01T04:48:49Z) - Diversity inducing Information Bottleneck in Model Ensembles [73.80615604822435]
In this paper, we target the problem of generating effective ensembles of neural networks by encouraging diversity in prediction.
We explicitly optimize a diversity inducing adversarial loss for learning latent variables and thereby obtain diversity in the output predictions necessary for modeling multi-modal data.
Compared to the most competitive baselines, we show significant improvements in classification accuracy, under a shift in the data distribution.
arXiv Detail & Related papers (2020-03-10T03:10:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.