MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization
- URL: http://arxiv.org/abs/2010.09278v3
- Date: Wed, 27 Sep 2023 11:38:52 GMT
- Title: MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization
- Authors: Wen Fei, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong
- Abstract summary: We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training.
We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer.
MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
- Score: 60.36100335878855
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Substantial experiments have validated the success of Batch Normalization
(BN) Layer in benefiting convergence and generalization. However, BN requires
extra memory and float-point calculation. Moreover, BN would be inaccurate on
micro-batch, as it depends on batch statistics. In this paper, we address these
problems by simplifying BN regularization while keeping two fundamental impacts
of BN layers, i.e., data decorrelation and adaptive learning rate. We propose a
novel normalization method, named MimicNorm, to improve the convergence and
efficiency in network training. MimicNorm consists of only two light
operations, including modified weight mean operations (subtract mean values
from weight parameter tensor) and one BN layer before loss function (last BN
layer). We leverage the neural tangent kernel (NTK) theory to prove that our
weight mean operation whitens activations and transits network into the chaotic
regime like BN layer, and consequently, leads to an enhanced convergence. The
last BN layer provides autotuned learning rates and also improves accuracy.
Experimental results show that MimicNorm achieves similar accuracy for various
network structures, including ResNets and lightweight networks like ShuffleNet,
with a reduction of about 20% memory consumption. The code is publicly
available at https://github.com/Kid-key/MimicNorm.
Related papers
- Unified Batch Normalization: Identifying and Alleviating the Feature
Condensation in Batch Normalization and a Unified Framework [55.22949690864962]
Batch Normalization (BN) has become an essential technique in contemporary neural network design.
We propose a two-stage unified framework called Unified Batch Normalization (UBN)
UBN significantly enhances performance across different visual backbones and different vision tasks.
arXiv Detail & Related papers (2023-11-27T16:41:31Z) - An Adaptive Batch Normalization in Deep Learning [0.0]
Batch Normalization (BN) is a way to accelerate and stabilize training in deep convolutional neural networks.
We propose a threshold-based adaptive BN approach that separates the data that requires the BN and data that does not require it.
arXiv Detail & Related papers (2022-11-03T12:12:56Z) - Understanding the Failure of Batch Normalization for Transformers in NLP [16.476194435004732]
Batch Normalization (BN) is a technique to accelerate the training of deep neural networks.
BN fails to defend its position in Natural Language Processing (NLP), which is dominated by Layer Normalization (LN)
Regularized BN (RBN) improves the performance of BN consistently and outperforms or is on par with LN on 17 out of 20 settings.
arXiv Detail & Related papers (2022-10-11T05:18:47Z) - BiTAT: Neural Network Binarization with Task-dependent Aggregated
Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation.
Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration.
This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z) - Stochastic Whitening Batch Normalization [9.514475896906605]
Batch Normalization (BN) is a popular technique for training Deep Neural Networks (DNNs)
The recently proposed Iterative Normalization (IterNorm) method improves these properties by whitening the activations iteratively using Newton's method.
We show that while SWBN improves convergence rate and generalization, its computational overhead is less than that of IterNorm.
arXiv Detail & Related papers (2021-06-03T20:45:42Z) - BN-invariant sharpness regularizes the training model to better
generalization [72.97766238317081]
We propose a measure of sharpness, BN-Sharpness, which gives consistent value for equivalent networks under BN.
We use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective.
arXiv Detail & Related papers (2021-01-08T10:23:24Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z) - Optimal Quantization for Batch Normalization in Neural Network
Deployments and Beyond [18.14282813812512]
Batch Normalization (BN) poses a challenge for Quantized Neural Networks (QNNs)
We propose a novel method to quantize BN by converting an affine transformation of two floating points to a fixed-point operation with shared quantized scale.
Our method is verified by experiments at layer level on CIFAR and ImageNet datasets.
arXiv Detail & Related papers (2020-08-30T09:33:29Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.