Ghost Noise for Regularizing Deep Neural Networks
- URL: http://arxiv.org/abs/2305.17205v2
- Date: Tue, 19 Dec 2023 15:12:37 GMT
- Title: Ghost Noise for Regularizing Deep Neural Networks
- Authors: Atli Kosson, Dongyang Fan, Martin Jaggi
- Abstract summary: Batch Normalization (BN) is widely used to stabilize the optimization process and improve the test performance of deep neural networks.
We propose a new regularization technique called Ghost Noise Injection (GNI) that imitates the noise in GBN without incurring the detrimental train-test discrepancy effects of small batch training.
- Score: 38.08431828419127
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Batch Normalization (BN) is widely used to stabilize the optimization process
and improve the test performance of deep neural networks. The regularization
effect of BN depends on the batch size and explicitly using smaller batch sizes
with Batch Normalization, a method known as Ghost Batch Normalization (GBN),
has been found to improve generalization in many settings. We investigate the
effectiveness of GBN by disentangling the induced ``Ghost Noise'' from
normalization and quantitatively analyzing the distribution of noise as well as
its impact on model performance. Inspired by our analysis, we propose a new
regularization technique called Ghost Noise Injection (GNI) that imitates the
noise in GBN without incurring the detrimental train-test discrepancy effects
of small batch training. We experimentally show that GNI can provide a greater
generalization benefit than GBN. Ghost Noise Injection can also be beneficial
in otherwise non-noisy settings such as layer-normalized networks, providing
additional evidence of the usefulness of Ghost Noise in Batch Normalization as
a regularizer.
Related papers
- Implicit Bias in Noisy-SGD: With Applications to Differentially Private
Training [9.618473763561418]
Training Deep Neural Networks (DNNs) with small batches using Gradient Descent (SGD) yields superior test performance compared to larger batches.
DP-SGD, used to ensure differential privacy (DP) in DNNs' training, adds Gaussian noise to the clipped gradients.
Surprisingly, large-batch training still results in a significant decrease in performance, which poses an important challenge because strong DP guarantees necessitate the use of massive batches.
arXiv Detail & Related papers (2024-02-13T10:19:33Z) - Feature Noise Boosts DNN Generalization under Label Noise [65.36889005555669]
The presence of label noise in the training data has a profound impact on the generalization of deep neural networks (DNNs)
In this study, we introduce and theoretically demonstrate a simple feature noise method, which directly adds noise to the features of training data.
arXiv Detail & Related papers (2023-08-03T08:31:31Z) - Asymmetric Heavy Tails and Implicit Bias in Gaussian Noise Injections [73.95786440318369]
We focus on the so-called implicit effect' of GNIs, which is the effect of the injected noise on the dynamics of gradient descent (SGD)
We show that this effect induces an asymmetric heavy-tailed noise on gradient updates.
We then formally prove that GNIs induce an implicit bias', which varies depending on the heaviness of the tails and the level of asymmetry.
arXiv Detail & Related papers (2021-02-13T21:28:09Z) - Batch Group Normalization [45.03388237812212]
Batch Normalization (BN) performs well at medium and large batch sizes.
BN saturates at small/extreme large batch sizes due to noisy/confused statistic calculation.
BGN is proposed to solve the noisy/confused statistic calculation of BN at small/extreme large batch sizes.
arXiv Detail & Related papers (2020-12-04T18:57:52Z) - Explicit Regularisation in Gaussian Noise Injections [64.11680298737963]
We study the regularisation induced in neural networks by Gaussian noise injections (GNIs)
We derive the explicit regulariser of GNIs, obtained by marginalising out the injected noise.
We show analytically and empirically that such regularisation produces calibrated classifiers with large classification margins.
arXiv Detail & Related papers (2020-07-14T21:29:46Z) - Shape Matters: Understanding the Implicit Bias of the Noise Covariance [76.54300276636982]
Noise in gradient descent provides a crucial implicit regularization effect for training over parameterized models.
We show that parameter-dependent noise -- induced by mini-batches or label perturbation -- is far more effective than Gaussian noise.
Our analysis reveals that parameter-dependent noise introduces a bias towards local minima with smaller noise variance, whereas spherical Gaussian noise does not.
arXiv Detail & Related papers (2020-06-15T18:31:02Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.