An Investigation into the Stochasticity of Batch Whitening
- URL: http://arxiv.org/abs/2003.12327v1
- Date: Fri, 27 Mar 2020 11:06:32 GMT
- Title: An Investigation into the Stochasticity of Batch Whitening
- Authors: Lei Huang, Lei Zhao, Yi Zhou, Fan Zhu, Li Liu, Ling Shao
- Abstract summary: This paper investigates the more general Batch Whitening (BW) operation.
We show that while various whitening transformations equivalently improve the conditioning, they show significantly different behaviors in discriminative scenarios and training Generative Adrial Networks (GAN)
Our proposed BW algorithm improves the residual networks by a significant margin on ImageNetversaity.
- Score: 95.54842420166862
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Batch Normalization (BN) is extensively employed in various network
architectures by performing standardization within mini-batches.
A full understanding of the process has been a central target in the deep
learning communities.
Unlike existing works, which usually only analyze the standardization
operation, this paper investigates the more general Batch Whitening (BW). Our
work originates from the observation that while various whitening
transformations equivalently improve the conditioning, they show significantly
different behaviors in discriminative scenarios and training Generative
Adversarial Networks (GANs).
We attribute this phenomenon to the stochasticity that BW introduces.
We quantitatively investigate the stochasticity of different whitening
transformations and show that it correlates well with the optimization
behaviors during training.
We also investigate how stochasticity relates to the estimation of population
statistics during inference.
Based on our analysis, we provide a framework for designing and comparing BW
algorithms in different scenarios.
Our proposed BW algorithm improves the residual networks by a significant
margin on ImageNet classification.
Besides, we show that the stochasticity of BW can improve the GAN's
performance with, however, the sacrifice of the training stability.
Related papers
- Unified Batch Normalization: Identifying and Alleviating the Feature
Condensation in Batch Normalization and a Unified Framework [55.22949690864962]
Batch Normalization (BN) has become an essential technique in contemporary neural network design.
We propose a two-stage unified framework called Unified Batch Normalization (UBN)
UBN significantly enhances performance across different visual backbones and different vision tasks.
arXiv Detail & Related papers (2023-11-27T16:41:31Z) - Overcoming Recency Bias of Normalization Statistics in Continual
Learning: Balance and Adaptation [67.77048565738728]
Continual learning involves learning a sequence of tasks and balancing their knowledge appropriately.
We propose Adaptive Balance of BN (AdaB$2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions.
Our approach achieves significant performance gains across a wide range of benchmarks.
arXiv Detail & Related papers (2023-10-13T04:50:40Z) - Test-time Batch Normalization [61.292862024903584]
Deep neural networks often suffer the data distribution shift between training and testing.
We revisit the batch normalization (BN) in the training process and reveal two key insights benefiting test-time optimization.
We propose a novel test-time BN layer design, GpreBN, which is optimized during testing by minimizing Entropy loss.
arXiv Detail & Related papers (2022-05-20T14:33:39Z) - Rebalancing Batch Normalization for Exemplar-based Class-Incremental
Learning [23.621259845287824]
Batch Normalization (BN) has been extensively studied for neural nets in various computer vision tasks.
We develop a new update patch for BN, particularly tailored for the exemplar-based class-incremental learning (CIL)
arXiv Detail & Related papers (2022-01-29T11:03:03Z) - Gated Information Bottleneck for Generalization in Sequential
Environments [13.795129636387623]
Deep neural networks suffer from poor generalization to unseen environments when the underlying data distribution is different from that in the training set.
We propose a new neural network-based IB approach, termed gated information bottleneck (GIB)
We empirically demonstrate the superiority of GIB over other popular neural network-based IB approaches in adversarial robustness and out-of-distribution detection.
arXiv Detail & Related papers (2021-10-12T14:58:38Z) - Test-time Batch Statistics Calibration for Covariate Shift [66.7044675981449]
We propose to adapt the deep models to the novel environment during inference.
We present a general formulation $alpha$-BN to calibrate the batch statistics.
We also present a novel loss function to form a unified test time adaptation framework Core.
arXiv Detail & Related papers (2021-10-06T08:45:03Z) - Decentralized Local Stochastic Extra-Gradient for Variational
Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices.
We make a very general assumption on the computational network that covers the settings of fully decentralized calculations.
We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z) - More Is More -- Narrowing the Generalization Gap by Adding
Classification Heads [8.883733362171032]
We introduce an architecture enhancement for existing neural network models based on input transformations, termed 'TransNet'
Our model can be employed during training time only and then pruned for prediction, resulting in an equivalent architecture to the base model.
arXiv Detail & Related papers (2021-02-09T16:30:33Z) - Unbiased Deep Reinforcement Learning: A General Training Framework for
Existing and Future Algorithms [3.7050607140679026]
We propose a novel training framework that is conceptually comprehensible and potentially easy to be generalized to all feasible algorithms for reinforcement learning.
We employ Monte-carlo sampling to achieve raw data inputs, and train them in batch to achieve Markov decision process sequences.
We propose several algorithms embedded with our new framework to deal with typical discrete and continuous scenarios.
arXiv Detail & Related papers (2020-05-12T01:51:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.