Generation and Analysis of Feature-Dependent Pseudo Noise for Training
Deep Neural Networks
- URL: http://arxiv.org/abs/2105.10796v1
- Date: Sat, 22 May 2021 19:15:26 GMT
- Title: Generation and Analysis of Feature-Dependent Pseudo Noise for Training
Deep Neural Networks
- Authors: Sree Ram Kamabattula, Kumudha Musini, Babak Namazi, Ganesh
Sankaranarayanan, Venkat Devarajan
- Abstract summary: Training Deep neural networks (DNNs) on noisy labeled datasets is a challenging problem, because learning on mislabeled examples deteriorates the performance of the network.
We propose an intuitive approach to creating feature-dependent noisy datasets by utilizing the training predictions of DNNs on clean datasets that also retain true label information.
We conduct several experiments to establish that Pseudo noisy datasets resemble feature-dependent noisy datasets across different conditions.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Training Deep neural networks (DNNs) on noisy labeled datasets is a
challenging problem, because learning on mislabeled examples deteriorates the
performance of the network. As the ground truth availability is limited with
real-world noisy datasets, previous papers created synthetic noisy datasets by
randomly modifying the labels of training examples of clean datasets. However,
no final conclusions can be derived by just using this random noise, since it
excludes feature-dependent noise. Thus, it is imperative to generate
feature-dependent noisy datasets that additionally provide ground truth.
Therefore, we propose an intuitive approach to creating feature-dependent noisy
datasets by utilizing the training predictions of DNNs on clean datasets that
also retain true label information. We refer to these datasets as "Pseudo Noisy
datasets". We conduct several experiments to establish that Pseudo noisy
datasets resemble feature-dependent noisy datasets across different conditions.
We further randomly generate synthetic noisy datasets with the same noise
distribution as that of Pseudo noise (referred as "Randomized Noise") to
empirically show that i) learning is easier with feature-dependent label noise
compared to random noise, ii) irrespective of noise distribution, Pseudo noisy
datasets mimic feature-dependent label noise and iii) current training methods
are not generalizable to feature-dependent label noise. Therefore, we believe
that Pseudo noisy datasets will be quite helpful to study and develop robust
training methods.
Related papers
- NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition [3.726602636064681]
We present an analysis that shows that real noise is significantly more challenging than simulated noise.
We show that current state-of-the-art models for noise-robust learning fall far short of their theoretically achievable upper bound.
arXiv Detail & Related papers (2024-05-13T10:20:31Z) - Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets.
We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z) - Generating the Ground Truth: Synthetic Data for Soft Label and Label Noise Research [0.0]
We introduce SYNLABEL, a framework designed to create noiseless datasets informed by real-world data.
We demonstrate its ability to precisely quantify label noise and its improvement over existing methodologies.
arXiv Detail & Related papers (2023-09-08T13:31:06Z) - Optimizing the Noise in Self-Supervised Learning: from Importance
Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs)
We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data.
We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z) - Towards Harnessing Feature Embedding for Robust Learning with Noisy
Labels [44.133307197696446]
The memorization effect of deep neural networks (DNNs) plays a pivotal role in recent label noise learning methods.
We propose a novel feature embedding-based method for deep learning with label noise, termed LabEl NoiseDilution (LEND)
arXiv Detail & Related papers (2022-06-27T02:45:09Z) - The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators.
In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z) - Learning with Noisy Labels Revisited: A Study Using Real-World Human
Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise.
This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N)
We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z) - The potential of self-supervised networks for random noise suppression
in seismic data [0.0]
Blind-spot networks are shown to be an efficient suppressor of random noise in seismic data.
Results are compared with two commonly used random denoising techniques: FX-deconvolution and Curvelet transform.
We believe this is just the beginning of utilising self-supervised learning in seismic applications.
arXiv Detail & Related papers (2021-09-15T14:57:43Z) - Tackling Instance-Dependent Label Noise via a Universal Probabilistic
Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances.
Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z) - Adaptive noise imitation for image denoising [58.21456707617451]
We develop a new textbfadaptive noise imitation (ADANI) algorithm that can synthesize noisy data from naturally noisy images.
To produce realistic noise, a noise generator takes unpaired noisy/clean images as input, where the noisy image is a guide for noise generation.
Coupling the noisy data output from ADANI with the corresponding ground-truth, a denoising CNN is then trained in a fully-supervised manner.
arXiv Detail & Related papers (2020-11-30T02:49:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.