A New Look at Ghost Normalization
- URL: http://arxiv.org/abs/2007.08554v1
- Date: Thu, 16 Jul 2020 18:23:52 GMT
- Title: A New Look at Ghost Normalization
- Authors: Neofytos Dimitriou, Ognjen Arandjelovic
- Abstract summary: Ghost normalization (GhostNorm) has been shown to improve upon BatchNorm in some datasets.
Our contributions are: (i) we uncover a source of regularization that is unique to GhostNorm, and not simply an extension from BatchNorm, and (ii) three types of GhostNorm implementations are described.
- Score: 12.331754048486554
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Batch normalization (BatchNorm) is an effective yet poorly understood
technique for neural network optimization. It is often assumed that the
degradation in BatchNorm performance to smaller batch sizes stems from it
having to estimate layer statistics using smaller sample sizes. However,
recently, Ghost normalization (GhostNorm), a variant of BatchNorm that
explicitly uses smaller sample sizes for normalization, has been shown to
improve upon BatchNorm in some datasets. Our contributions are: (i) we uncover
a source of regularization that is unique to GhostNorm, and not simply an
extension from BatchNorm, (ii) three types of GhostNorm implementations are
described, two of which employ BatchNorm as the underlying normalization
technique, (iii) by visualising the loss landscape of GhostNorm, we observe
that GhostNorm consistently decreases the smoothness when compared to
BatchNorm, (iv) we introduce Sequential Normalization (SeqNorm), and report
superior performance over state-of-the-art methodologies on both CIFAR--10 and
CIFAR--100 datasets.
Related papers
- Ghost Noise for Regularizing Deep Neural Networks [38.08431828419127]
Batch Normalization (BN) is widely used to stabilize the optimization process and improve the test performance of deep neural networks.
We propose a new regularization technique called Ghost Noise Injection (GNI) that imitates the noise in GBN without incurring the detrimental train-test discrepancy effects of small batch training.
arXiv Detail & Related papers (2023-05-26T18:53:35Z) - An Empirical Analysis of the Shift and Scale Parameters in BatchNorm [3.198144010381572]
Batch Normalization (BatchNorm) is a technique that improves the training of deep neural networks.
This paper examines the relative contribution to the success of BatchNorm of the normalization step.
arXiv Detail & Related papers (2023-03-22T12:41:12Z) - Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy.
To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN.
Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z) - Kernel Normalized Convolutional Networks [15.997774467236352]
BatchNorm, however, performs poorly with small batch sizes, and is inapplicable to differential privacy.
We propose KernelNorm and kernel normalized convolutional layers, and incorporate them into kernel normalized convolutional networks (KNConvNets)
KNConvNets achieve higher or competitive performance compared to BatchNorm counterparts in image classification and semantic segmentation.
arXiv Detail & Related papers (2022-05-20T11:18:05Z) - Is Batch Norm unique? An empirical investigation and prescription to
emulate the best properties of common normalizers without batch dependence [33.07255026021875]
We study the statistical properties of Batch Norm and other common normalizers.
We propose two simple normalizers, PreLayerNorm and RegNorm, which better match these desirable properties without involving operations along the batch dimension.
arXiv Detail & Related papers (2020-10-21T00:41:38Z) - MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training.
We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer.
MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z) - GraphNorm: A Principled Approach to Accelerating Graph Neural Network
Training [101.3819906739515]
We study what normalization is effective for Graph Neural Networks (GNNs)
Faster convergence is achieved with InstanceNorm compared to BatchNorm and LayerNorm.
GraphNorm also improves the generalization of GNNs, achieving better performance on graph classification benchmarks.
arXiv Detail & Related papers (2020-09-07T17:55:21Z) - Towards an Adversarially Robust Normalization Approach [8.744644782067368]
Batch Normalization (BatchNorm) is effective for improving the performance and accelerating the training of deep neural networks.
It has also shown to be a cause of adversarial vulnerability, i.e., networks without it are more robust to adversarial attacks.
We propose Robust Normalization (RobustNorm); an adversarially robust version of BatchNorm.
arXiv Detail & Related papers (2020-06-19T08:12:25Z) - Evolving Normalization-Activation Layers [100.82879448303805]
We develop efficient rejection protocols to quickly filter out candidate layers that do not work well.
Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures.
Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets.
arXiv Detail & Related papers (2020-04-06T19:52:48Z) - Cross-Iteration Batch Normalization [67.83430009388678]
We present Cross-It Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality.
CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique.
arXiv Detail & Related papers (2020-02-13T18:52:57Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.