Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization
- URL: http://arxiv.org/abs/2001.06838v2
- Date: Wed, 8 Apr 2020 10:06:09 GMT
- Title: Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization
- Authors: Junjie Yan, Ruosi Wan, Xiangyu Zhang, Wei Zhang, Yichen Wei, Jian Sun
- Abstract summary: Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
- Score: 126.6252371899064
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Batch Normalization (BN) is one of the most widely used techniques in Deep
Learning field. But its performance can awfully degrade with insufficient batch
size. This weakness limits the usage of BN on many computer vision tasks like
detection or segmentation, where batch size is usually small due to the
constraint of memory consumption. Therefore many modified normalization
techniques have been proposed, which either fail to restore the performance of
BN completely, or have to introduce additional nonlinear operations in
inference procedure and increase huge consumption. In this paper, we reveal
that there are two extra batch statistics involved in backward propagation of
BN, on which has never been well discussed before. The extra batch statistics
associated with gradients also can severely affect the training of deep neural
network. Based on our analysis, we propose a novel normalization method, named
Moving Average Batch Normalization (MABN). MABN can completely restore the
performance of vanilla BN in small batch cases, without introducing any
additional nonlinear operations in inference procedure. We prove the benefits
of MABN by both theoretical analysis and experiments. Our experiments
demonstrate the effectiveness of MABN in multiple computer vision tasks
including ImageNet and COCO. The code has been released in
https://github.com/megvii-model/MABN.
Related papers
- Unified Batch Normalization: Identifying and Alleviating the Feature
Condensation in Batch Normalization and a Unified Framework [55.22949690864962]
Batch Normalization (BN) has become an essential technique in contemporary neural network design.
We propose a two-stage unified framework called Unified Batch Normalization (UBN)
UBN significantly enhances performance across different visual backbones and different vision tasks.
arXiv Detail & Related papers (2023-11-27T16:41:31Z) - Patch-aware Batch Normalization for Improving Cross-domain Robustness [55.06956781674986]
Cross-domain tasks present a challenge in which the model's performance will degrade when the training set and the test set follow different distributions.
We propose a novel method called patch-aware batch normalization (PBN)
By exploiting the differences between local patches of an image, our proposed PBN can effectively enhance the robustness of the model's parameters.
arXiv Detail & Related papers (2023-04-06T03:25:42Z) - Rebalancing Batch Normalization for Exemplar-based Class-Incremental
Learning [23.621259845287824]
Batch Normalization (BN) has been extensively studied for neural nets in various computer vision tasks.
We develop a new update patch for BN, particularly tailored for the exemplar-based class-incremental learning (CIL)
arXiv Detail & Related papers (2022-01-29T11:03:03Z) - Revisiting Batch Normalization [0.0]
Batch normalization (BN) is essential for training deep neural networks.
We revisit the BN formulation and present a new method and update approach for BN to address the aforementioned issues.
Experimental results using the proposed alterations to BN show statistically significant performance gains in a variety of scenarios.
We also present a new online BN-based input data normalization technique to alleviate the need for other offline or fixed methods.
arXiv Detail & Related papers (2021-10-26T19:48:19Z) - Batch Normalization Preconditioning for Neural Network Training [7.709342743709842]
Batch normalization (BN) is a popular and ubiquitous method in deep learning.
BN is not suitable for use with very small mini-batch sizes or online learning.
We propose a new method called Batch Normalization Preconditioning (BNP)
arXiv Detail & Related papers (2021-08-02T18:17:26Z) - MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training.
We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer.
MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z) - PowerNorm: Rethinking Batch Normalization in Transformers [96.14956636022957]
normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN)
LN is preferred due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks.
We propose Power Normalization (PN), a novel normalization scheme that resolves this issue.
arXiv Detail & Related papers (2020-03-17T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.