An Empirical Analysis of the Shift and Scale Parameters in BatchNorm
- URL: http://arxiv.org/abs/2303.12818v1
- Date: Wed, 22 Mar 2023 12:41:12 GMT
- Title: An Empirical Analysis of the Shift and Scale Parameters in BatchNorm
- Authors: Yashna Peerthum and Mark Stamp
- Abstract summary: Batch Normalization (BatchNorm) is a technique that improves the training of deep neural networks.
This paper examines the relative contribution to the success of BatchNorm of the normalization step.
- Score: 3.198144010381572
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Batch Normalization (BatchNorm) is a technique that improves the training of
deep neural networks, especially Convolutional Neural Networks (CNN). It has
been empirically demonstrated that BatchNorm increases performance, stability,
and accuracy, although the reasons for such improvements are unclear. BatchNorm
includes a normalization step as well as trainable shift and scale parameters.
In this paper, we empirically examine the relative contribution to the success
of BatchNorm of the normalization step, as compared to the re-parameterization
via shifting and scaling. To conduct our experiments, we implement two new
optimizers in PyTorch, namely, a version of BatchNorm that we refer to as
AffineLayer, which includes the re-parameterization step without normalization,
and a version with just the normalization step, that we call BatchNorm-minus.
We compare the performance of our AffineLayer and BatchNorm-minus
implementations to standard BatchNorm, and we also compare these to the case
where no batch normalization is used. We experiment with four ResNet
architectures (ResNet18, ResNet34, ResNet50, and ResNet101) over a standard
image dataset and multiple batch sizes. Among other findings, we provide
empirical evidence that the success of BatchNorm may derive primarily from
improved weight initialization.
Related papers
- Patch-aware Batch Normalization for Improving Cross-domain Robustness [55.06956781674986]
Cross-domain tasks present a challenge in which the model's performance will degrade when the training set and the test set follow different distributions.
We propose a novel method called patch-aware batch normalization (PBN)
By exploiting the differences between local patches of an image, our proposed PBN can effectively enhance the robustness of the model's parameters.
arXiv Detail & Related papers (2023-04-06T03:25:42Z) - Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy.
To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN.
Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z) - Kernel Normalized Convolutional Networks [15.997774467236352]
BatchNorm, however, performs poorly with small batch sizes, and is inapplicable to differential privacy.
We propose KernelNorm and kernel normalized convolutional layers, and incorporate them into kernel normalized convolutional networks (KNConvNets)
KNConvNets achieve higher or competitive performance compared to BatchNorm counterparts in image classification and semantic segmentation.
arXiv Detail & Related papers (2022-05-20T11:18:05Z) - Is Batch Norm unique? An empirical investigation and prescription to
emulate the best properties of common normalizers without batch dependence [33.07255026021875]
We study the statistical properties of Batch Norm and other common normalizers.
We propose two simple normalizers, PreLayerNorm and RegNorm, which better match these desirable properties without involving operations along the batch dimension.
arXiv Detail & Related papers (2020-10-21T00:41:38Z) - MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training.
We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer.
MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z) - A New Look at Ghost Normalization [12.331754048486554]
Ghost normalization (GhostNorm) has been shown to improve upon BatchNorm in some datasets.
Our contributions are: (i) we uncover a source of regularization that is unique to GhostNorm, and not simply an extension from BatchNorm, and (ii) three types of GhostNorm implementations are described.
arXiv Detail & Related papers (2020-07-16T18:23:52Z) - Evolving Normalization-Activation Layers [100.82879448303805]
We develop efficient rejection protocols to quickly filter out candidate layers that do not work well.
Our method leads to the discovery of EvoNorms, a set of new normalization-activation layers with novel, and sometimes surprising structures.
Our experiments show that EvoNorms work well on image classification models including ResNets, MobileNets and EfficientNets.
arXiv Detail & Related papers (2020-04-06T19:52:48Z) - Separating the Effects of Batch Normalization on CNN Training Speed and
Stability Using Classical Adaptive Filter Theory [40.55789598448379]
Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability.
This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm.
arXiv Detail & Related papers (2020-02-25T05:25:40Z) - Cross-Iteration Batch Normalization [67.83430009388678]
We present Cross-It Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality.
CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique.
arXiv Detail & Related papers (2020-02-13T18:52:57Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.