Cross-Iteration Batch Normalization
- URL: http://arxiv.org/abs/2002.05712v3
- Date: Thu, 25 Mar 2021 06:57:36 GMT
- Title: Cross-Iteration Batch Normalization
- Authors: Zhuliang Yao, Yue Cao, Shuxin Zheng, Gao Huang, Stephen Lin
- Abstract summary: We present Cross-It Batch Normalization (CBN), in which examples from multiple recent iterations are jointly utilized to enhance estimation quality.
CBN is found to outperform the original batch normalization and a direct calculation of statistics over previous iterations without the proposed compensation technique.
- Score: 67.83430009388678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A well-known issue of Batch Normalization is its significantly reduced
effectiveness in the case of small mini-batch sizes. When a mini-batch contains
few examples, the statistics upon which the normalization is defined cannot be
reliably estimated from it during a training iteration. To address this
problem, we present Cross-Iteration Batch Normalization (CBN), in which
examples from multiple recent iterations are jointly utilized to enhance
estimation quality. A challenge of computing statistics over multiple
iterations is that the network activations from different iterations are not
comparable to each other due to changes in network weights. We thus compensate
for the network weight changes via a proposed technique based on Taylor
polynomials, so that the statistics can be accurately estimated and batch
normalization can be effectively applied. On object detection and image
classification with small mini-batch sizes, CBN is found to outperform the
original batch normalization and a direct calculation of statistics over
previous iterations without the proposed compensation technique. Code is
available at https://github.com/Howal/Cross-iterationBatchNorm .
Related papers
- Exploring the Efficacy of Group-Normalization in Deep Learning Models for Alzheimer's Disease Classification [2.6447365674762273]
Group Normalization is an easy alternative to Batch Normalization.
GN achieves a very low error rate of 10.6% compared to Batch Normalization.
arXiv Detail & Related papers (2024-04-01T06:10:11Z) - Post-Training Quantization for Re-parameterization via Coarse & Fine
Weight Splitting [13.270381125055275]
We propose a coarse & fine weight splitting (CFWS) method to reduce quantization error of weight.
We develop an improved KL metric to determine optimal quantization scales for activation.
For example, the quantized RepVGG-A1 model exhibits a mere 0.3% accuracy loss.
arXiv Detail & Related papers (2023-12-17T02:31:20Z) - Patch-aware Batch Normalization for Improving Cross-domain Robustness [55.06956781674986]
Cross-domain tasks present a challenge in which the model's performance will degrade when the training set and the test set follow different distributions.
We propose a novel method called patch-aware batch normalization (PBN)
By exploiting the differences between local patches of an image, our proposed PBN can effectively enhance the robustness of the model's parameters.
arXiv Detail & Related papers (2023-04-06T03:25:42Z) - Compound Batch Normalization for Long-tailed Image Classification [77.42829178064807]
We propose a compound batch normalization method based on a Gaussian mixture.
It can model the feature space more comprehensively and reduce the dominance of head classes.
The proposed method outperforms existing methods on long-tailed image classification.
arXiv Detail & Related papers (2022-12-02T07:31:39Z) - Variance-Aware Weight Initialization for Point Convolutional Neural
Networks [23.46612653627991]
We propose a framework to unify the multitude of continuous convolutions.
We show that this framework can avoid batch normalization while achieving similar and, in some cases, better performance.
arXiv Detail & Related papers (2021-12-07T15:47:14Z) - Cluster-Promoting Quantization with Bit-Drop for Minimizing Network
Quantization Loss [61.26793005355441]
Cluster-Promoting Quantization (CPQ) finds the optimal quantization grids for neural networks.
DropBits is a new bit-drop technique that revises the standard dropout regularization to randomly drop bits instead of neurons.
We experimentally validate our method on various benchmark datasets and network architectures.
arXiv Detail & Related papers (2021-09-05T15:15:07Z) - Comparing Normalization Methods for Limited Batch Size Segmentation
Neural Networks [0.0]
Batch Normalization works best using large batch size during training.
We show the effectiveness of Instance Normalization in the limited batch size neural network training environment.
We also show that the Instance Normalization implementation used in this experiment is computational time efficient when compared to the network without any normalization method.
arXiv Detail & Related papers (2020-11-23T17:13:24Z) - WeightAlign: Normalizing Activations by Weight Alignment [16.85286948260155]
Batch normalization (BN) allows training very deep networks by normalizing activations by mini-batch sample statistics.
Such methods are less stable than BN as they critically depend on the statistics of a single input sample.
We present WeightAlign: a method that normalizes the weights by the mean and scaled standard derivation computed within a filter, which normalizes activations without computing any sample statistics.
arXiv Detail & Related papers (2020-10-14T15:25:39Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.