Double Forward Propagation for Memorized Batch Normalization
- URL: http://arxiv.org/abs/2010.04947v1
- Date: Sat, 10 Oct 2020 08:48:41 GMT
- Title: Double Forward Propagation for Memorized Batch Normalization
- Authors: Yong Guo, Qingyao Wu, Chaorui Deng, Jian Chen, Mingkui Tan
- Abstract summary: Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
- Score: 68.34268180871416
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Batch Normalization (BN) has been a standard component in designing deep
neural networks (DNNs). Although the standard BN can significantly accelerate
the training of DNNs and improve the generalization performance, it has several
underlying limitations which may hamper the performance in both training and
inference. In the training stage, BN relies on estimating the mean and variance
of data using a single minibatch. Consequently, BN can be unstable when the
batch size is very small or the data is poorly sampled. In the inference stage,
BN often uses the so called moving mean and moving variance instead of batch
statistics, i.e., the training and inference rules in BN are not consistent.
Regarding these issues, we propose a memorized batch normalization (MBN), which
considers multiple recent batches to obtain more accurate and robust
statistics. Note that after the SGD update for each batch, the model parameters
will change, and the features will change accordingly, leading to the
Distribution Shift before and after the update for the considered batch. To
alleviate this issue, we present a simple Double-Forward scheme in MBN which
can further improve the performance. Compared to related methods, the proposed
MBN exhibits consistent behaviors in both training and inference. Empirical
results show that the MBN based models trained with the Double-Forward scheme
greatly reduce the sensitivity of data and significantly improve the
generalization performance.
Related papers
- Unified Batch Normalization: Identifying and Alleviating the Feature
Condensation in Batch Normalization and a Unified Framework [55.22949690864962]
Batch Normalization (BN) has become an essential technique in contemporary neural network design.
We propose a two-stage unified framework called Unified Batch Normalization (UBN)
UBN significantly enhances performance across different visual backbones and different vision tasks.
arXiv Detail & Related papers (2023-11-27T16:41:31Z) - Overcoming Recency Bias of Normalization Statistics in Continual
Learning: Balance and Adaptation [67.77048565738728]
Continual learning involves learning a sequence of tasks and balancing their knowledge appropriately.
We propose Adaptive Balance of BN (AdaB$2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions.
Our approach achieves significant performance gains across a wide range of benchmarks.
arXiv Detail & Related papers (2023-10-13T04:50:40Z) - An Adaptive Batch Normalization in Deep Learning [0.0]
Batch Normalization (BN) is a way to accelerate and stabilize training in deep convolutional neural networks.
We propose a threshold-based adaptive BN approach that separates the data that requires the BN and data that does not require it.
arXiv Detail & Related papers (2022-11-03T12:12:56Z) - Understanding the Failure of Batch Normalization for Transformers in NLP [16.476194435004732]
Batch Normalization (BN) is a technique to accelerate the training of deep neural networks.
BN fails to defend its position in Natural Language Processing (NLP), which is dominated by Layer Normalization (LN)
Regularized BN (RBN) improves the performance of BN consistently and outperforms or is on par with LN on 17 out of 20 settings.
arXiv Detail & Related papers (2022-10-11T05:18:47Z) - Rebalancing Batch Normalization for Exemplar-based Class-Incremental
Learning [23.621259845287824]
Batch Normalization (BN) has been extensively studied for neural nets in various computer vision tasks.
We develop a new update patch for BN, particularly tailored for the exemplar-based class-incremental learning (CIL)
arXiv Detail & Related papers (2022-01-29T11:03:03Z) - Revisiting Batch Normalization [0.0]
Batch normalization (BN) is essential for training deep neural networks.
We revisit the BN formulation and present a new method and update approach for BN to address the aforementioned issues.
Experimental results using the proposed alterations to BN show statistically significant performance gains in a variety of scenarios.
We also present a new online BN-based input data normalization technique to alleviate the need for other offline or fixed methods.
arXiv Detail & Related papers (2021-10-26T19:48:19Z) - MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training.
We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer.
MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z) - PowerNorm: Rethinking Batch Normalization in Transformers [96.14956636022957]
normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN)
LN is preferred due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks.
We propose Power Normalization (PN), a novel normalization scheme that resolves this issue.
arXiv Detail & Related papers (2020-03-17T17:50:26Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.