Related papers: How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?

How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?

URL: http://arxiv.org/abs/2510.17526v1
Date: Mon, 20 Oct 2025 13:28:13 GMT
Title: How Does Label Noise Gradient Descent Improve Generalization in the Low SNR Regime?
Authors: Wei Huang, Andi Han, Yujin Song, Yilan Chen, Denny Wu, Difan Zou, Taiji Suzuki,
Abstract summary: We investigate whether introducing label noise to the gradient updates can enhance the test performance of neural network (NN)<n>We prove that adding label noise during training suppresses noise memorization, preventing it from dominating the learning process.<n>In contrast, we show that NN trained with standard GD tends to overfit to noise in the same low SNR setting.
Score: 78.0226274470175
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The capacity of deep learning models is often large enough to both learn the underlying statistical signal and overfit to noise in the training set. This noise memorization can be harmful especially for data with a low signal-to-noise ratio (SNR), leading to poor generalization. Inspired by prior observations that label noise provides implicit regularization that improves generalization, in this work, we investigate whether introducing label noise to the gradient updates can enhance the test performance of neural network (NN) in the low SNR regime. Specifically, we consider training a two-layer NN with a simple label noise gradient descent (GD) algorithm, in an idealized signal-noise data setting. We prove that adding label noise during training suppresses noise memorization, preventing it from dominating the learning process; consequently, label noise GD enjoys rapid signal growth while the overfitting remains controlled, thereby achieving good generalization despite the low SNR. In contrast, we also show that NN trained with standard GD tends to overfit to noise in the same low SNR setting and establish a non-vanishing lower bound on its test error, thus demonstrating the benefit of introducing label noise in gradient-based training.

Related papers

Pre-train to Gain: Robust Learning Without Clean Labels [1.1582652820340928]
Training deep networks with noisy labels leads to poor generalization and degraded accuracy.<n>By pre-training a feature extractor backbone without labels, we can train a more noise robust model without requiring a subset with clean labels.<n>Our approach achieves comparable results to ImageNet pre-trained models at low noise levels, while substantially outperforming them under high noise conditions.
arXiv Detail & Related papers (2025-11-25T20:48:07Z)
Mitigating the Noise Shift for Denoising Generative Models via Noise Awareness Guidance [54.88271057438763]
Noise Awareness Guidance (NAG) is a correction method that explicitly steers sampling trajectories to remain consistent with the pre-defined noise schedule.<n>NAG consistently mitigates noise shift and substantially improves the generation quality of mainstream diffusion models.
arXiv Detail & Related papers (2025-10-14T13:31:34Z)
Noise Augmented Fine Tuning for Mitigating Hallucinations in Large Language Models [1.0579965347526206]
Large language models (LLMs) often produce inaccurate or misleading content-hallucinations.<n>Noise-Augmented Fine-Tuning (NoiseFiT) is a novel framework that leverages adaptive noise injection to enhance model robustness.<n>NoiseFiT selectively perturbs layers identified as either high-SNR (more robust) or low-SNR (potentially under-regularized) using a dynamically scaled Gaussian noise.
arXiv Detail & Related papers (2025-04-04T09:27:19Z)
Feature Noise Boosts DNN Generalization under Label Noise [65.36889005555669]
The presence of label noise in the training data has a profound impact on the generalization of deep neural networks (DNNs) In this study, we introduce and theoretically demonstrate a simple feature noise method, which directly adds noise to the features of training data.
arXiv Detail & Related papers (2023-08-03T08:31:31Z)
Per-Example Gradient Regularization Improves Learning Signals from Noisy Data [25.646054298195434]
Empirical evidence suggests that gradient regularization technique can significantly enhance the robustness of deep learning models against noisy perturbations. We present a theoretical analysis that demonstrates its effectiveness in improving both test error and robustness against noise perturbations. Our analysis reveals that PEGR penalizes the variance of pattern learning, thus effectively suppressing the memorization of noises from the training data.
arXiv Detail & Related papers (2023-03-31T10:08:23Z)
Latent Class-Conditional Noise Model [54.56899309997246]
We introduce a Latent Class-Conditional Noise model (LCCN) to parameterize the noise transition under a Bayesian framework. We then deduce a dynamic label regression method for LCCN, whose Gibbs sampler allows us efficiently infer the latent true labels. Our approach safeguards the stable update of the noise transition, which avoids previous arbitrarily tuning from a mini-batch of samples.
arXiv Detail & Related papers (2023-02-19T15:24:37Z)
Zero-shot Blind Image Denoising via Implicit Neural Representations [77.79032012459243]
We propose an alternative denoising strategy that leverages the architectural inductive bias of implicit neural representations (INRs) We show that our method outperforms existing zero-shot denoising methods under an extensive set of low-noise or real-noise scenarios.
arXiv Detail & Related papers (2022-04-05T12:46:36Z)
Open-set Label Noise Can Improve Robustness Against Inherent Label Noise [27.885927200376386]
We show that open-set noisy labels can be non-toxic and even benefit the robustness against inherent noisy labels. We propose a simple yet effective regularization by introducing Open-set samples with Dynamic Noisy Labels (ODNL) into training.
arXiv Detail & Related papers (2021-06-21T07:15:50Z)
Training Classifiers that are Universally Robust to All Label Noise Levels [91.13870793906968]
Deep neural networks are prone to overfitting in the presence of label noise. We propose a distillation-based framework that incorporates a new subcategory of Positive-Unlabeled learning. Our framework generally outperforms at medium to high noise levels.
arXiv Detail & Related papers (2021-05-27T13:49:31Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.