Related papers: An Investigation into the Stochasticity of Batch Whitening

An Investigation into the Stochasticity of Batch Whitening

URL: http://arxiv.org/abs/2003.12327v1
Date: Fri, 27 Mar 2020 11:06:32 GMT
Title: An Investigation into the Stochasticity of Batch Whitening
Authors: Lei Huang, Lei Zhao, Yi Zhou, Fan Zhu, Li Liu, Ling Shao
Abstract summary: This paper investigates the more general Batch Whitening (BW) operation. We show that while various whitening transformations equivalently improve the conditioning, they show significantly different behaviors in discriminative scenarios and training Generative Adrial Networks (GAN) Our proposed BW algorithm improves the residual networks by a significant margin on ImageNetversaity.
Score: 95.54842420166862
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Batch Normalization (BN) is extensively employed in various network architectures by performing standardization within mini-batches. A full understanding of the process has been a central target in the deep learning communities. Unlike existing works, which usually only analyze the standardization operation, this paper investigates the more general Batch Whitening (BW). Our work originates from the observation that while various whitening transformations equivalently improve the conditioning, they show significantly different behaviors in discriminative scenarios and training Generative Adversarial Networks (GANs). We attribute this phenomenon to the stochasticity that BW introduces. We quantitatively investigate the stochasticity of different whitening transformations and show that it correlates well with the optimization behaviors during training. We also investigate how stochasticity relates to the estimation of population statistics during inference. Based on our analysis, we provide a framework for designing and comparing BW algorithms in different scenarios. Our proposed BW algorithm improves the residual networks by a significant margin on ImageNet classification. Besides, we show that the stochasticity of BW can improve the GAN's performance with, however, the sacrifice of the training stability.

Related papers

Generalizing to any diverse distribution: uniformity, gentle finetuning and rebalancing [55.791818510796645]
We aim to develop models that generalize well to any diverse test distribution, even if the latter deviates significantly from the training data. Various approaches like domain adaptation, domain generalization, and robust optimization attempt to address the out-of-distribution challenge. We adopt a more conservative perspective by accounting for the worst-case error across all sufficiently diverse test distributions within a known domain.
arXiv Detail & Related papers (2024-10-08T12:26:48Z)
Covariance-corrected Whitening Alleviates Network Degeneration on Imbalanced Classification [6.197116272789107]
Class imbalance is a critical issue in image classification that significantly affects the performance of deep recognition models. We propose a novel framework called Whitening-Net to mitigate the degenerate solutions. In scenarios with extreme class imbalance, the batch covariance statistic exhibits significant fluctuations, impeding the convergence of the whitening operation.
arXiv Detail & Related papers (2024-08-30T10:49:33Z)
Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework [55.22949690864962]
Batch Normalization (BN) has become an essential technique in contemporary neural network design. We propose a two-stage unified framework called Unified Batch Normalization (UBN) UBN significantly enhances performance across different visual backbones and different vision tasks.
arXiv Detail & Related papers (2023-11-27T16:41:31Z)
Overcoming Recency Bias of Normalization Statistics in Continual Learning: Balance and Adaptation [67.77048565738728]
Continual learning involves learning a sequence of tasks and balancing their knowledge appropriately. We propose Adaptive Balance of BN (AdaB$2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions. Our approach achieves significant performance gains across a wide range of benchmarks.
arXiv Detail & Related papers (2023-10-13T04:50:40Z)
Gated Information Bottleneck for Generalization in Sequential Environments [13.795129636387623]
Deep neural networks suffer from poor generalization to unseen environments when the underlying data distribution is different from that in the training set. We propose a new neural network-based IB approach, termed gated information bottleneck (GIB) We empirically demonstrate the superiority of GIB over other popular neural network-based IB approaches in adversarial robustness and out-of-distribution detection.
arXiv Detail & Related papers (2021-10-12T14:58:38Z)
Test-time Batch Statistics Calibration for Covariate Shift [66.7044675981449]
We propose to adapt the deep models to the novel environment during inference. We present a general formulation $alpha$-BN to calibrate the batch statistics. We also present a novel loss function to form a unified test time adaptation framework Core.
arXiv Detail & Related papers (2021-10-06T08:45:03Z)
Decentralized Local Stochastic Extra-Gradient for Variational Inequalities [125.62877849447729]
We consider distributed variational inequalities (VIs) on domains with the problem data that is heterogeneous (non-IID) and distributed across many devices. We make a very general assumption on the computational network that covers the settings of fully decentralized calculations. We theoretically analyze its convergence rate in the strongly-monotone, monotone, and non-monotone settings.
arXiv Detail & Related papers (2021-06-15T17:45:51Z)
More Is More -- Narrowing the Generalization Gap by Adding Classification Heads [8.883733362171032]
We introduce an architecture enhancement for existing neural network models based on input transformations, termed 'TransNet' Our model can be employed during training time only and then pruned for prediction, resulting in an equivalent architecture to the base model.
arXiv Detail & Related papers (2021-02-09T16:30:33Z)
Unbiased Deep Reinforcement Learning: A General Training Framework for Existing and Future Algorithms [3.7050607140679026]
We propose a novel training framework that is conceptually comprehensible and potentially easy to be generalized to all feasible algorithms for reinforcement learning. We employ Monte-carlo sampling to achieve raw data inputs, and train them in batch to achieve Markov decision process sequences. We propose several algorithms embedded with our new framework to deal with typical discrete and continuous scenarios.
arXiv Detail & Related papers (2020-05-12T01:51:08Z)

This list is automatically generated from the titles and abstracts of the papers in this site.