Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections
- URL: http://arxiv.org/abs/2102.07006v1
- Date: Sat, 13 Feb 2021 21:28:09 GMT
- Title: Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections
- Authors: Alexander Camuto, Xiaoyu Wang, Lingjiong Zhu, Chris Holmes, Mert
G\"urb\"uzbalaban, Umut \c{S}im\c{s}ekli
- Abstract summary: We focus on the so-called implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of gradient descent (SGD)
We show that this effect induces an asymmetric heavy-tailed noise on gradient updates.
We then formally prove that GNIs induce an implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry.
- Score: 73.95786440318369
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Gaussian noise injections (GNIs) are a family of simple and widely-used
regularisation methods for training neural networks, where one injects additive
or multiplicative Gaussian noise to the network activations at every iteration
of the optimisation algorithm, which is typically chosen as stochastic gradient
descent (SGD). In this paper we focus on the so-called `implicit effect' of
GNIs, which is the effect of the injected noise on the dynamics of SGD. We show
that this effect induces an asymmetric heavy-tailed noise on SGD gradient
updates. In order to model this modified dynamics, we first develop a
Langevin-like stochastic differential equation that is driven by a general
family of asymmetric heavy-tailed noise. Using this model we then formally
prove that GNIs induce an `implicit bias', which varies depending on the
heaviness of the tails and the level of asymmetry. Our empirical results
confirm that different types of neural networks trained with GNIs are
well-modelled by the proposed dynamics and that the implicit effect of these
injections induces a bias that degrades the performance of networks.
Related papers
- Doubly Stochastic Models: Learning with Unbiased Label Noises and
Inference Stability [85.1044381834036]
We investigate the implicit regularization effects of label noises under mini-batch sampling settings of gradient descent.
We find such implicit regularizer would favor some convergence points that could stabilize model outputs against perturbation of parameters.
Our work doesn't assume SGD as an Ornstein-Uhlenbeck like process and achieve a more general result with convergence of approximation proved.
arXiv Detail & Related papers (2023-04-01T14:09:07Z) - Extracting stochastic dynamical systems with $\alpha$-stable L\'evy
noise from data [14.230182518492311]
We propose a data-driven method to extract systems with $alpha$-stable L'evy noise from short burst data.
More specifically, we first estimate the L'evy jump measure and noise intensity.
Then we approximate the drift coefficient by combining nonlocal Kramers-Moyal formulas with normalizing flows.
arXiv Detail & Related papers (2021-09-30T06:57:42Z) - Positive-Negative Momentum: Manipulating Stochastic Gradient Noise to
Improve Generalization [89.7882166459412]
gradient noise (SGN) acts as implicit regularization for deep learning.
Some works attempted to artificially simulate SGN by injecting random noise to improve deep learning.
For simulating SGN at low computational costs and without changing the learning rate or batch size, we propose the Positive-Negative Momentum (PNM) approach.
arXiv Detail & Related papers (2021-03-31T16:08:06Z) - A Distributed Optimisation Framework Combining Natural Gradient with
Hessian-Free for Discriminative Sequence Training [16.83036203524611]
This paper presents a novel natural gradient and Hessian-free (NGHF) optimisation framework for neural network training.
It relies on the linear conjugate gradient (CG) algorithm to combine the natural gradient (NG) method with local curvature information from Hessian-free (HF) or other second-order methods.
Experiments are reported on the multi-genre broadcast data set for a range of different acoustic model types.
arXiv Detail & Related papers (2021-03-12T22:18:34Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Explicit Regularisation in Gaussian Noise Injections [64.11680298737963]
We study the regularisation induced in neural networks by Gaussian noise injections (GNIs)
We derive the explicit regulariser of GNIs, obtained by marginalising out the injected noise.
We show analytically and empirically that such regularisation produces calibrated classifiers with large classification margins.
arXiv Detail & Related papers (2020-07-14T21:29:46Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z) - A Data-Driven Approach for Discovering Stochastic Dynamical Systems with
Non-Gaussian Levy Noise [5.17900889163564]
We develop a new data-driven approach to extract governing laws from noisy data sets.
First, we establish a feasible theoretical framework, by expressing the drift coefficient, diffusion coefficient and jump measure.
We then design a numerical algorithm to compute the drift, diffusion coefficient and jump measure, and thus extract a governing equation with Gaussian and non-Gaussian noise.
arXiv Detail & Related papers (2020-05-07T21:29:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.