Stochastic Whitening Batch Normalization
- URL: http://arxiv.org/abs/2106.04413v1
- Date: Thu, 3 Jun 2021 20:45:42 GMT
- Title: Stochastic Whitening Batch Normalization
- Authors: Shengdong Zhang, Ehsan Nezhadarya, Homa Fashandi, Jiayi Liu, Darin
Graham, Mohak Shah
- Abstract summary: Batch Normalization (BN) is a popular technique for training Deep Neural Networks (DNNs)
The recently proposed Iterative Normalization (IterNorm) method improves these properties by whitening the activations iteratively using Newton's method.
We show that while SWBN improves convergence rate and generalization, its computational overhead is less than that of IterNorm.
- Score: 9.514475896906605
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Batch Normalization (BN) is a popular technique for training Deep Neural
Networks (DNNs). BN uses scaling and shifting to normalize activations of
mini-batches to accelerate convergence and improve generalization. The recently
proposed Iterative Normalization (IterNorm) method improves these properties by
whitening the activations iteratively using Newton's method. However, since
Newton's method initializes the whitening matrix independently at each training
step, no information is shared between consecutive steps. In this work, instead
of exact computation of whitening matrix at each time step, we estimate it
gradually during training in an online fashion, using our proposed Stochastic
Whitening Batch Normalization (SWBN) algorithm. We show that while SWBN
improves the convergence rate and generalization of DNNs, its computational
overhead is less than that of IterNorm. Due to the high efficiency of the
proposed method, it can be easily employed in most DNN architectures with a
large number of layers. We provide comprehensive experiments and comparisons
between BN, IterNorm, and SWBN layers to demonstrate the effectiveness of the
proposed technique in conventional (many-shot) image classification and
few-shot classification tasks.
Related papers
- AskewSGD : An Annealed interval-constrained Optimisation method to train
Quantized Neural Networks [12.229154524476405]
We develop a new algorithm, Annealed Skewed SGD - AskewSGD - for training deep neural networks (DNNs) with quantized weights.
Unlike algorithms with active sets and feasible directions, AskewSGD avoids projections or optimization under the entire feasible set.
Experimental results show that the AskewSGD algorithm performs better than or on par with state of the art methods in classical benchmarks.
arXiv Detail & Related papers (2022-11-07T18:13:44Z) - Revisiting Batch Normalization [0.0]
Batch normalization (BN) is essential for training deep neural networks.
We revisit the BN formulation and present a new method and update approach for BN to address the aforementioned issues.
Experimental results using the proposed alterations to BN show statistically significant performance gains in a variety of scenarios.
We also present a new online BN-based input data normalization technique to alleviate the need for other offline or fixed methods.
arXiv Detail & Related papers (2021-10-26T19:48:19Z) - Batch Normalization Preconditioning for Neural Network Training [7.709342743709842]
Batch normalization (BN) is a popular and ubiquitous method in deep learning.
BN is not suitable for use with very small mini-batch sizes or online learning.
We propose a new method called Batch Normalization Preconditioning (BNP)
arXiv Detail & Related papers (2021-08-02T18:17:26Z) - BN-invariant sharpness regularizes the training model to better
generalization [72.97766238317081]
We propose a measure of sharpness, BN-Sharpness, which gives consistent value for equivalent networks under BN.
We use the BN-sharpness to regularize the training and design an algorithm to minimize the new regularized objective.
arXiv Detail & Related papers (2021-01-08T10:23:24Z) - MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training.
We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer.
MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z) - Learning Low-rank Deep Neural Networks via Singular Vector Orthogonality
Regularization and Singular Value Sparsification [53.50708351813565]
We propose SVD training, the first method to explicitly achieve low-rank DNNs during training without applying SVD on every step.
We empirically show that SVD training can significantly reduce the rank of DNN layers and achieve higher reduction on computation load under the same accuracy.
arXiv Detail & Related papers (2020-04-20T02:40:43Z) - PowerNorm: Rethinking Batch Normalization in Transformers [96.14956636022957]
normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN)
LN is preferred due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks.
We propose Power Normalization (PN), a novel normalization scheme that resolves this issue.
arXiv Detail & Related papers (2020-03-17T17:50:26Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.