On the Inherent Regularization Effects of Noise Injection During
Training
- URL: http://arxiv.org/abs/2102.07379v1
- Date: Mon, 15 Feb 2021 07:43:18 GMT
- Title: On the Inherent Regularization Effects of Noise Injection During
Training
- Authors: Oussama Dhifallah and Yue M. Lu
- Abstract summary: We present a theoretical study of one particular way of random perturbation, which corresponds to injecting artificial noise to the training data.
We provide a precise characterization of the training and generalization errors of such randomly perturbed learning problems on a random feature model.
- Score: 12.614901374282868
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Randomly perturbing networks during the training process is a commonly used
approach to improving generalization performance. In this paper, we present a
theoretical study of one particular way of random perturbation, which
corresponds to injecting artificial noise to the training data. We provide a
precise asymptotic characterization of the training and generalization errors
of such randomly perturbed learning problems on a random feature model. Our
analysis shows that Gaussian noise injection in the training process is
equivalent to introducing a weighted ridge regularization, when the number of
noise injections tends to infinity. The explicit form of the regularization is
also given. Numerical results corroborate our asymptotic predictions, showing
that they are accurate even in moderate problem dimensions. Our theoretical
predictions are based on a new correlated Gaussian equivalence conjecture that
generalizes recent results in the study of random feature models.
Related papers
- May the Noise be with you: Adversarial Training without Adversarial
Examples [3.4673556247932225]
We investigate the question: Can we obtain adversarially-trained models without training on adversarial?
Our proposed approach incorporates inherentity by embedding Gaussian noise within the layers of the NN model at training time.
Our work contributes adversarially trained networks using a completely different approach, with empirically similar robustness to adversarial training.
arXiv Detail & Related papers (2023-12-12T08:22:28Z) - A Heavy-Tailed Algebra for Probabilistic Programming [53.32246823168763]
We propose a systematic approach for analyzing the tails of random variables.
We show how this approach can be used during the static analysis (before drawing samples) pass of a probabilistic programming language compiler.
Our empirical results confirm that inference algorithms that leverage our heavy-tailed algebra attain superior performance across a number of density modeling and variational inference tasks.
arXiv Detail & Related papers (2023-06-15T16:37:36Z) - Learning Linear Causal Representations from Interventions under General
Nonlinear Mixing [52.66151568785088]
We prove strong identifiability results given unknown single-node interventions without access to the intervention targets.
This is the first instance of causal identifiability from non-paired interventions for deep neural network embeddings.
arXiv Detail & Related papers (2023-06-04T02:32:12Z) - Fluctuations, Bias, Variance & Ensemble of Learners: Exact Asymptotics
for Convex Losses in High-Dimension [25.711297863946193]
We develop a theory for the study of fluctuations in an ensemble of generalised linear models trained on different, but correlated, features.
We provide a complete description of the joint distribution of the empirical risk minimiser for generic convex loss and regularisation in the high-dimensional limit.
arXiv Detail & Related papers (2022-01-31T17:44:58Z) - Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections [73.95786440318369]
We focus on the so-called implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of gradient descent (SGD)
We show that this effect induces an asymmetric heavy-tailed noise on gradient updates.
We then formally prove that GNIs induce an implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry.
arXiv Detail & Related papers (2021-02-13T21:28:09Z) - Binary Classification of Gaussian Mixtures: Abundance of Support
Vectors, Benign Overfitting and Regularization [39.35822033674126]
We study binary linear classification under a generative Gaussian mixture model.
We derive novel non-asymptotic bounds on the classification error of the latter.
Our results extend to a noisy model with constant probability noise flips.
arXiv Detail & Related papers (2020-11-18T07:59:55Z) - Understanding Double Descent Requires a Fine-Grained Bias-Variance
Decomposition [34.235007566913396]
We describe an interpretable, symmetric decomposition of the variance into terms associated with the labels.
We find that the bias decreases monotonically with the network width, but the variance terms exhibit non-monotonic behavior.
We also analyze the strikingly rich phenomenology that arises.
arXiv Detail & Related papers (2020-11-04T21:04:02Z) - Explicit Regularisation in Gaussian Noise Injections [64.11680298737963]
We study the regularisation induced in neural networks by Gaussian noise injections (GNIs)
We derive the explicit regulariser of GNIs, obtained by marginalising out the injected noise.
We show analytically and empirically that such regularisation produces calibrated classifiers with large classification margins.
arXiv Detail & Related papers (2020-07-14T21:29:46Z) - Multiplicative noise and heavy tails in stochastic optimization [62.993432503309485]
empirical optimization is central to modern machine learning, but its role in its success is still unclear.
We show that it commonly arises in parameters of discrete multiplicative noise due to variance.
A detailed analysis is conducted in which we describe on key factors, including recent step size, and data, all exhibit similar results on state-of-the-art neural network models.
arXiv Detail & Related papers (2020-06-11T09:58:01Z) - Efficiently Sampling Functions from Gaussian Process Posteriors [76.94808614373609]
We propose an easy-to-use and general-purpose approach for fast posterior sampling.
We demonstrate how decoupled sample paths accurately represent Gaussian process posteriors at a fraction of the usual cost.
arXiv Detail & Related papers (2020-02-21T14:03:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.