Related papers: Batch Group Normalization

Batch Group Normalization

URL: http://arxiv.org/abs/2012.02782v2
Date: Wed, 9 Dec 2020 01:26:51 GMT
Title: Batch Group Normalization
Authors: Xiao-Yun Zhou, Jiacheng Sun, Nanyang Ye, Xu Lan, Qijun Luo, Bo-Lin Lai, Pedro Esperanca, Guang-Zhong Yang, Zhenguo Li
Abstract summary: Batch Normalization (BN) performs well at medium and large batch sizes. BN saturates at small/extreme large batch sizes due to noisy/confused statistic calculation. BGN is proposed to solve the noisy/confused statistic calculation of BN at small/extreme large batch sizes.
Score: 45.03388237812212
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Convolutional Neural Networks (DCNNs) are hard and time-consuming to train. Normalization is one of the effective solutions. Among previous normalization methods, Batch Normalization (BN) performs well at medium and large batch sizes and is with good generalizability to multiple vision tasks, while its performance degrades significantly at small batch sizes. In this paper, we find that BN saturates at extreme large batch sizes, i.e., 128 images per worker, i.e., GPU, as well and propose that the degradation/saturation of BN at small/extreme large batch sizes is caused by noisy/confused statistic calculation. Hence without adding new trainable parameters, using multiple-layer or multi-iteration information, or introducing extra computation, Batch Group Normalization (BGN) is proposed to solve the noisy/confused statistic calculation of BN at small/extreme large batch sizes with introducing the channel, height and width dimension to compensate. The group technique in Group Normalization (GN) is used and a hyper-parameter G is used to control the number of feature instances used for statistic calculation, hence to offer neither noisy nor confused statistic for different batch sizes. We empirically demonstrate that BGN consistently outperforms BN, Instance Normalization (IN), Layer Normalization (LN), GN, and Positional Normalization (PN), across a wide spectrum of vision tasks, including image classification, Neural Architecture Search (NAS), adversarial learning, Few Shot Learning (FSL) and Unsupervised Domain Adaptation (UDA), indicating its good performance, robust stability to batch size and wide generalizability. For example, for training ResNet-50 on ImageNet with a batch size of 2, BN achieves Top1 accuracy of 66.512% while BGN achieves 76.096% with notable improvement.

Related papers

Exploring the Efficacy of Group-Normalization in Deep Learning Models for Alzheimer's Disease Classification [2.6447365674762273]
Group Normalization is an easy alternative to Batch Normalization. GN achieves a very low error rate of 10.6% compared to Batch Normalization.
arXiv Detail & Related papers (2024-04-01T06:10:11Z)
Batch Layer Normalization, A new normalization layer for CNNs and RNN [0.0]
This study introduces a new normalization layer termed Batch Layer Normalization (BLN) As a combined version of batch and layer normalization, BLN adaptively puts appropriate weight on mini-batch and feature normalization based on the inverse size of mini-batches. Test results indicate the application potential of BLN and its faster convergence than batch normalization and layer normalization in both Convolutional and Recurrent Neural Networks.
arXiv Detail & Related papers (2022-09-19T10:12:51Z)
BN-invariant sharpness regularizes the training model to better generalization [72.97766238317081]
We propose a measure of sharpness, BN-Sharpness, which gives consistent value for equivalent networks under BN. We use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective.
arXiv Detail & Related papers (2021-01-08T10:23:24Z)
MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training. We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer. MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z)
Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs) We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics. Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z)
Extended Batch Normalization [3.377000738091241]
Batch normalization (BN) has become a standard technique for training the modern deep networks. In this paper, we propose a simple but effective method, called extended batch normalization (EBN) Experiments show that extended batch normalization alleviates the problem of batch normalization with small batch size while achieving close performances to batch normalization with large batch size.
arXiv Detail & Related papers (2020-03-12T01:53:15Z)
Cross-Iteration Batch Normalization [67.83430009388678]
We present Cross-It Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality. CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique.
arXiv Detail & Related papers (2020-02-13T18:52:57Z)
Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method. We show that MABN can completely restore the performance of vanilla BN in small batch cases. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.