Anticorrelated Noise Injection for Improved Generalization
- URL: http://arxiv.org/abs/2202.02831v3
- Date: Fri, 19 May 2023 17:26:04 GMT
- Title: Anticorrelated Noise Injection for Improved Generalization
- Authors: Antonio Orvieto, Hans Kersting, Frank Proske, Francis Bach, Aurelien
Lucchi
- Abstract summary: Injecting artificial noise into gradient descent (GD) is commonly employed to improve the performance of machine learning models.
It is, however, not known if this is optimal or whether other types of noise could provide better generalization performance.
We consider a variety of objective functions for which we find that GD with anticorrelated perturbations ("Anti-PGD") generalizes significantly better than GD and standard (uncorrelated) PGD.
- Score: 6.970991851511823
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Injecting artificial noise into gradient descent (GD) is commonly employed to
improve the performance of machine learning models. Usually, uncorrelated noise
is used in such perturbed gradient descent (PGD) methods. It is, however, not
known if this is optimal or whether other types of noise could provide better
generalization performance. In this paper, we zoom in on the problem of
correlating the perturbations of consecutive PGD steps. We consider a variety
of objective functions for which we find that GD with anticorrelated
perturbations ("Anti-PGD") generalizes significantly better than GD and
standard (uncorrelated) PGD. To support these experimental findings, we also
derive a theoretical analysis that demonstrates that Anti-PGD moves to wider
minima, while GD and PGD remain stuck in suboptimal regions or even diverge.
This new connection between anticorrelated noise and generalization opens the
field to novel ways to exploit noise for training machine learning models.
Related papers
- Butterfly Effects of SGD Noise: Error Amplification in Behavior Cloning
and Autoregression [70.78523583702209]
We study training instabilities of behavior cloning with deep neural networks.
We observe that minibatch SGD updates to the policy network during training result in sharp oscillations in long-horizon rewards.
arXiv Detail & Related papers (2023-10-17T17:39:40Z) - Per-Example Gradient Regularization Improves Learning Signals from Noisy
Data [25.646054298195434]
Empirical evidence suggests that gradient regularization technique can significantly enhance the robustness of deep learning models against noisy perturbations.
We present a theoretical analysis that demonstrates its effectiveness in improving both test error and robustness against noise perturbations.
Our analysis reveals that PEGR penalizes the variance of pattern learning, thus effectively suppressing the memorization of noises from the training data.
arXiv Detail & Related papers (2023-03-31T10:08:23Z) - Noise Injection Node Regularization for Robust Learning [0.0]
Noise Injection Node Regularization (NINR) is a method of injecting structured noise into Deep Neural Networks (DNN) during the training stage, resulting in an emergent regularizing effect.
We present theoretical and empirical evidence for substantial improvement in robustness against various test data perturbations for feed-forward DNNs when trained under NINR.
arXiv Detail & Related papers (2022-10-27T20:51:15Z) - On the Theoretical Properties of Noise Correlation in Stochastic
Optimization [6.970991851511823]
We show that fPGD possesses exploration abilities favorable over PGD and Anti-PGD.
These results open the field to novel ways to exploit noise for machine learning models.
arXiv Detail & Related papers (2022-09-19T16:32:22Z) - Guided Diffusion Model for Adversarial Purification [103.4596751105955]
Adversarial attacks disturb deep neural networks (DNNs) in various algorithms and frameworks.
We propose a novel purification approach, referred to as guided diffusion model for purification (GDMP)
On our comprehensive experiments across various datasets, the proposed GDMP is shown to reduce the perturbations raised by adversarial attacks to a shallow range.
arXiv Detail & Related papers (2022-05-30T10:11:15Z) - Heavy-tailed denoising score matching [5.371337604556311]
We develop an iterative noise scaling algorithm to consistently initialise the multiple levels of noise in Langevin dynamics.
On the practical side, our use of heavy-tailed DSM leads to improved score estimation, controllable sampling convergence, and more balanced unconditional generative performance for imbalanced datasets.
arXiv Detail & Related papers (2021-12-17T22:04:55Z) - Optimizing Information-theoretical Generalization Bounds via Anisotropic
Noise in SGLD [73.55632827932101]
We optimize the information-theoretical generalization bound by manipulating the noise structure in SGLD.
We prove that with constraint to guarantee low empirical risk, the optimal noise covariance is the square root of the expected gradient covariance.
arXiv Detail & Related papers (2021-10-26T15:02:27Z) - Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections [73.95786440318369]
We focus on the so-called implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of gradient descent (SGD)
We show that this effect induces an asymmetric heavy-tailed noise on gradient updates.
We then formally prove that GNIs induce an implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry.
arXiv Detail & Related papers (2021-02-13T21:28:09Z) - When Does Preconditioning Help or Hurt Generalization? [74.25170084614098]
We show how the textitimplicit bias of first and second order methods affects the comparison of generalization properties.
We discuss several approaches to manage the bias-variance tradeoff, and the potential benefit of interpolating between GD and NGD.
arXiv Detail & Related papers (2020-06-18T17:57:26Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.