Related papers: MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization

MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization

URL: http://arxiv.org/abs/2010.09278v3
Date: Wed, 27 Sep 2023 11:38:52 GMT
Title: MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization
Authors: Wen Fei, Wenrui Dai, Chenglin Li, Junni Zou, Hongkai Xiong
Abstract summary: We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training. We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer. MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
Score: 60.36100335878855
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Substantial experiments have validated the success of Batch Normalization (BN) Layer in benefiting convergence and generalization. However, BN requires extra memory and float-point calculation. Moreover, BN would be inaccurate on micro-batch, as it depends on batch statistics. In this paper, we address these problems by simplifying BN regularization while keeping two fundamental impacts of BN layers, i.e., data decorrelation and adaptive learning rate. We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training. MimicNorm consists of only two light operations, including modified weight mean operations (subtract mean values from weight parameter tensor) and one BN layer before loss function (last BN layer). We leverage the neural tangent kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer, and consequently, leads to an enhanced convergence. The last BN layer provides autotuned learning rates and also improves accuracy. Experimental results show that MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption. The code is publicly available at https://github.com/Kid-key/MimicNorm.

Related papers

Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework [55.22949690864962]
Batch Normalization (BN) has become an essential technique in contemporary neural network design. We propose a two-stage unified framework called Unified Batch Normalization (UBN) UBN significantly enhances performance across different visual backbones and different vision tasks.
arXiv Detail & Related papers (2023-11-27T16:41:31Z)
An Adaptive Batch Normalization in Deep Learning [0.0]
Batch Normalization (BN) is a way to accelerate and stabilize training in deep convolutional neural networks. We propose a threshold-based adaptive BN approach that separates the data that requires the BN and data that does not require it.
arXiv Detail & Related papers (2022-11-03T12:12:56Z)
Understanding the Failure of Batch Normalization for Transformers in NLP [16.476194435004732]
Batch Normalization (BN) is a technique to accelerate the training of deep neural networks. BN fails to defend its position in Natural Language Processing (NLP), which is dominated by Layer Normalization (LN) Regularized BN (RBN) improves the performance of BN consistently and outperforms or is on par with LN on 17 out of 20 settings.
arXiv Detail & Related papers (2022-10-11T05:18:47Z)
BiTAT: Neural Network Binarization with Task-dependent Aggregated Transformation [116.26521375592759]
Quantization aims to transform high-precision weights and activations of a given neural network into low-precision weights/activations for reduced memory usage and computation. Extreme quantization (1-bit weight/1-bit activations) of compactly-designed backbone architectures results in severe performance degeneration. This paper proposes a novel Quantization-Aware Training (QAT) method that can effectively alleviate performance degeneration.
arXiv Detail & Related papers (2022-07-04T13:25:49Z)
Stochastic Whitening Batch Normalization [9.514475896906605]
Batch Normalization (BN) is a popular technique for training Deep Neural Networks (DNNs) The recently proposed Iterative Normalization (IterNorm) method improves these properties by whitening the activations iteratively using Newton's method. We show that while SWBN improves convergence rate and generalization, its computational overhead is less than that of IterNorm.
arXiv Detail & Related papers (2021-06-03T20:45:42Z)
BN-invariant sharpness regularizes the training model to better generalization [72.97766238317081]
We propose a measure of sharpness, BN-Sharpness, which gives consistent value for equivalent networks under BN. We use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective.
arXiv Detail & Related papers (2021-01-08T10:23:24Z)
Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs) We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics. Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z)
Optimal Quantization for Batch Normalization in Neural Network Deployments and Beyond [18.14282813812512]
Batch Normalization (BN) poses a challenge for Quantized Neural Networks (QNNs) We propose a novel method to quantize BN by converting an affine transformation of two floating points to a fixed-point operation with shared quantized scale. Our method is verified by experiments at layer level on CIFAR and ImageNet datasets.
arXiv Detail & Related papers (2020-08-30T09:33:29Z)
Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method. We show that MABN can completely restore the performance of vanilla BN in small batch cases. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.