Batch Group Normalization
- URL: http://arxiv.org/abs/2012.02782v2
- Date: Wed, 9 Dec 2020 01:26:51 GMT
- Title: Batch Group Normalization
- Authors: Xiao-Yun Zhou, Jiacheng Sun, Nanyang Ye, Xu Lan, Qijun Luo, Bo-Lin
Lai, Pedro Esperanca, Guang-Zhong Yang, Zhenguo Li
- Abstract summary: Batch Normalization (BN) performs well at medium and large batch sizes.
BN saturates at small/extreme large batch sizes due to noisy/confused statistic calculation.
BGN is proposed to solve the noisy/confused statistic calculation of BN at small/extreme large batch sizes.
- Score: 45.03388237812212
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Deep Convolutional Neural Networks (DCNNs) are hard and time-consuming to
train. Normalization is one of the effective solutions. Among previous
normalization methods, Batch Normalization (BN) performs well at medium and
large batch sizes and is with good generalizability to multiple vision tasks,
while its performance degrades significantly at small batch sizes. In this
paper, we find that BN saturates at extreme large batch sizes, i.e., 128 images
per worker, i.e., GPU, as well and propose that the degradation/saturation of
BN at small/extreme large batch sizes is caused by noisy/confused statistic
calculation. Hence without adding new trainable parameters, using
multiple-layer or multi-iteration information, or introducing extra
computation, Batch Group Normalization (BGN) is proposed to solve the
noisy/confused statistic calculation of BN at small/extreme large batch sizes
with introducing the channel, height and width dimension to compensate. The
group technique in Group Normalization (GN) is used and a hyper-parameter G is
used to control the number of feature instances used for statistic calculation,
hence to offer neither noisy nor confused statistic for different batch sizes.
We empirically demonstrate that BGN consistently outperforms BN, Instance
Normalization (IN), Layer Normalization (LN), GN, and Positional Normalization
(PN), across a wide spectrum of vision tasks, including image classification,
Neural Architecture Search (NAS), adversarial learning, Few Shot Learning (FSL)
and Unsupervised Domain Adaptation (UDA), indicating its good performance,
robust stability to batch size and wide generalizability. For example, for
training ResNet-50 on ImageNet with a batch size of 2, BN achieves Top1
accuracy of 66.512% while BGN achieves 76.096% with notable improvement.
Related papers
- Exploring the Efficacy of Group-Normalization in Deep Learning Models for Alzheimer's Disease Classification [2.6447365674762273]
Group Normalization is an easy alternative to Batch Normalization.
GN achieves a very low error rate of 10.6% compared to Batch Normalization.
arXiv Detail & Related papers (2024-04-01T06:10:11Z) - Batch Layer Normalization, A new normalization layer for CNNs and RNN [0.0]
This study introduces a new normalization layer termed Batch Layer Normalization (BLN)
As a combined version of batch and layer normalization, BLN adaptively puts appropriate weight on mini-batch and feature normalization based on the inverse size of mini-batches.
Test results indicate the application potential of BLN and its faster convergence than batch normalization and layer normalization in both Convolutional and Recurrent Neural Networks.
arXiv Detail & Related papers (2022-09-19T10:12:51Z) - BN-invariant sharpness regularizes the training model to better
generalization [72.97766238317081]
We propose a measure of sharpness, BN-Sharpness, which gives consistent value for equivalent networks under BN.
We use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective.
arXiv Detail & Related papers (2021-01-08T10:23:24Z) - MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training.
We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer.
MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z) - Extended Batch Normalization [3.377000738091241]
Batch normalization (BN) has become a standard technique for training the modern deep networks.
In this paper, we propose a simple but effective method, called extended batch normalization (EBN)
Experiments show that extended batch normalization alleviates the problem of batch normalization with small batch size while achieving close performances to batch normalization with large batch size.
arXiv Detail & Related papers (2020-03-12T01:53:15Z) - Cross-Iteration Batch Normalization [67.83430009388678]
We present Cross-It Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality.
CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique.
arXiv Detail & Related papers (2020-02-13T18:52:57Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.