Related papers: Generation and Analysis of Feature-Dependent Pseudo Noise for Training Deep Neural Networks

Generation and Analysis of Feature-Dependent Pseudo Noise for Training Deep Neural Networks

URL: http://arxiv.org/abs/2105.10796v1
Date: Sat, 22 May 2021 19:15:26 GMT
Title: Generation and Analysis of Feature-Dependent Pseudo Noise for Training Deep Neural Networks
Authors: Sree Ram Kamabattula, Kumudha Musini, Babak Namazi, Ganesh Sankaranarayanan, Venkat Devarajan
Abstract summary: Training Deep neural networks (DNNs) on noisy labeled datasets is a challenging problem, because learning on mislabeled examples deteriorates the performance of the network. We propose an intuitive approach to creating feature-dependent noisy datasets by utilizing the training predictions of DNNs on clean datasets that also retain true label information. We conduct several experiments to establish that Pseudo noisy datasets resemble feature-dependent noisy datasets across different conditions.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Training Deep neural networks (DNNs) on noisy labeled datasets is a challenging problem, because learning on mislabeled examples deteriorates the performance of the network. As the ground truth availability is limited with real-world noisy datasets, previous papers created synthetic noisy datasets by randomly modifying the labels of training examples of clean datasets. However, no final conclusions can be derived by just using this random noise, since it excludes feature-dependent noise. Thus, it is imperative to generate feature-dependent noisy datasets that additionally provide ground truth. Therefore, we propose an intuitive approach to creating feature-dependent noisy datasets by utilizing the training predictions of DNNs on clean datasets that also retain true label information. We refer to these datasets as "Pseudo Noisy datasets". We conduct several experiments to establish that Pseudo noisy datasets resemble feature-dependent noisy datasets across different conditions. We further randomly generate synthetic noisy datasets with the same noise distribution as that of Pseudo noise (referred as "Randomized Noise") to empirically show that i) learning is easier with feature-dependent label noise compared to random noise, ii) irrespective of noise distribution, Pseudo noisy datasets mimic feature-dependent label noise and iii) current training methods are not generalizable to feature-dependent label noise. Therefore, we believe that Pseudo noisy datasets will be quite helpful to study and develop robust training methods.

Related papers

Generating Synthetic Oracle Datasets to Analyze Noise Impact: A Study on Building Function Classification Using Tweets [16.88765929875316]
In building function (BFC), tweets are collected using geographics and labeled via external databases. The impact of sentence level feature noise remains underexplored, largely due to the lack of clean benchmark datasets for controlled analysis. In this work, we propose a method for generating a synthetic dataset using LLM, designed to contain only tweets that are both correctly labeled and semantically relevant to their associated buildings.
arXiv Detail & Related papers (2025-03-28T20:18:28Z)
Classifying Long-tailed and Label-noise Data via Disentangling and Unlearning [58.052712054684946]
In real-world datasets, the challenges of long-tailed distributions and noisy labels often coexist. We propose a novel method called Disentangling and Unlearning for Long-tailed and Label-noisy data.
arXiv Detail & Related papers (2025-03-14T13:58:27Z)
Noisy Ostracods: A Fine-Grained, Imbalanced Real-World Dataset for Benchmarking Robust Machine Learning and Label Correction Methods [7.00297060532893]
The Noisy Ostracods dataset is a noisy dataset for genus and species classification of crustacean ostracods. The noise is open-set, including new classes discovered during curation that were not part of the original annotation. The Noisy Ostracods dataset is highly imbalanced with a imbalance factor $rho$ = 22429.
arXiv Detail & Related papers (2024-12-03T09:30:57Z)
NoiseBench: Benchmarking the Impact of Real Label Noise on Named Entity Recognition [3.726602636064681]
We present an analysis that shows that real noise is significantly more challenging than simulated noise. We show that current state-of-the-art models for noise-robust learning fall far short of their theoretically achievable upper bound.
arXiv Detail & Related papers (2024-05-13T10:20:31Z)
Learning with Noisy Foundation Models [95.50968225050012]
This paper is the first work to comprehensively understand and analyze the nature of noise in pre-training datasets. We propose a tuning method (NMTune) to affine the feature space to mitigate the malignant effect of noise and improve generalization.
arXiv Detail & Related papers (2024-03-11T16:22:41Z)
Generating the Ground Truth: Synthetic Data for Soft Label and Label Noise Research [0.0]
We introduce SYNLABEL, a framework designed to create noiseless datasets informed by real-world data. We demonstrate its ability to precisely quantify label noise and its improvement over existing methodologies.
arXiv Detail & Related papers (2023-09-08T13:31:06Z)
Optimizing the Noise in Self-Supervised Learning: from Importance Sampling to Noise-Contrastive Estimation [80.07065346699005]
It is widely assumed that the optimal noise distribution should be made equal to the data distribution, as in Generative Adversarial Networks (GANs) We turn to Noise-Contrastive Estimation which grounds this self-supervised task as an estimation problem of an energy-based model of the data. We soberly conclude that the optimal noise may be hard to sample from, and the gain in efficiency can be modest compared to choosing the noise distribution equal to the data's.
arXiv Detail & Related papers (2023-01-23T19:57:58Z)
Towards Harnessing Feature Embedding for Robust Learning with Noisy Labels [44.133307197696446]
The memorization effect of deep neural networks (DNNs) plays a pivotal role in recent label noise learning methods. We propose a novel feature embedding-based method for deep learning with label noise, termed LabEl NoiseDilution (LEND)
arXiv Detail & Related papers (2022-06-27T02:45:09Z)
The Optimal Noise in Noise-Contrastive Learning Is Not What You Think [80.07065346699005]
We show that deviating from this assumption can actually lead to better statistical estimators. In particular, the optimal noise distribution is different from the data's and even from a different family.
arXiv Detail & Related papers (2022-03-02T13:59:20Z)
Learning with Noisy Labels Revisited: A Study Using Real-World Human Annotations [54.400167806154535]
Existing research on learning with noisy labels mainly focuses on synthetic label noise. This work presents two new benchmark datasets (CIFAR-10N, CIFAR-100N) We show that real-world noisy labels follow an instance-dependent pattern rather than the classically adopted class-dependent ones.
arXiv Detail & Related papers (2021-10-22T22:42:11Z)
The potential of self-supervised networks for random noise suppression in seismic data [0.0]
Blind-spot networks are shown to be an efficient suppressor of random noise in seismic data. Results are compared with two commonly used random denoising techniques: FX-deconvolution and Curvelet transform. We believe this is just the beginning of utilising self-supervised learning in seismic applications.
arXiv Detail & Related papers (2021-09-15T14:57:43Z)
Tackling Instance-Dependent Label Noise via a Universal Probabilistic Model [80.91927573604438]
This paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness.
arXiv Detail & Related papers (2021-01-14T05:43:51Z)
Adaptive noise imitation for image denoising [58.21456707617451]
We develop a new textbfadaptive noise imitation (ADANI) algorithm that can synthesize noisy data from naturally noisy images. To produce realistic noise, a noise generator takes unpaired noisy/clean images as input, where the noisy image is a guide for noise generation. Coupling the noisy data output from ADANI with the corresponding ground-truth, a denoising CNN is then trained in a fully-supervised manner.
arXiv Detail & Related papers (2020-11-30T02:49:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.