Group Whitening: Balancing Learning Efficiency and Representational
Capacity
- URL: http://arxiv.org/abs/2009.13333v4
- Date: Tue, 6 Apr 2021 04:17:27 GMT
- Title: Group Whitening: Balancing Learning Efficiency and Representational
Capacity
- Authors: Lei Huang, Yi Zhou, Li Liu, Fan Zhu, Ling Shao
- Abstract summary: Group whitening (GW) exploits the advantages of the whitening operation and avoids the disadvantages of normalization within mini-batches.
We show that GW consistently improves the performance of different architectures, with absolute gains of $1.02%$ $sim$ $1.49%$ in top-1 accuracy on ImageNet and $1.82%$ $sim$ $3.21%$ in bounding box AP on COCO.
- Score: 98.52552448012598
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Batch normalization (BN) is an important technique commonly incorporated into
deep learning models to perform standardization within mini-batches. The merits
of BN in improving a model's learning efficiency can be further amplified by
applying whitening, while its drawbacks in estimating population statistics for
inference can be avoided through group normalization (GN). This paper proposes
group whitening (GW), which exploits the advantages of the whitening operation
and avoids the disadvantages of normalization within mini-batches. In addition,
we analyze the constraints imposed on features by normalization, and show how
the batch size (group number) affects the performance of batch (group)
normalized networks, from the perspective of model's representational capacity.
This analysis provides theoretical guidance for applying GW in practice.
Finally, we apply the proposed GW to ResNet and ResNeXt architectures and
conduct experiments on the ImageNet and COCO benchmarks. Results show that GW
consistently improves the performance of different architectures, with absolute
gains of $1.02\%$ $\sim$ $1.49\%$ in top-1 accuracy on ImageNet and $1.82\%$
$\sim$ $3.21\%$ in bounding box AP on COCO.
Related papers
- Covariance-corrected Whitening Alleviates Network Degeneration on Imbalanced Classification [6.197116272789107]
Class imbalance is a critical issue in image classification that significantly affects the performance of deep recognition models.
We propose a novel framework called Whitening-Net to mitigate the degenerate solutions.
In scenarios with extreme class imbalance, the batch covariance statistic exhibits significant fluctuations, impeding the convergence of the whitening operation.
arXiv Detail & Related papers (2024-08-30T10:49:33Z) - Generalized Logit Adjustment: Calibrating Fine-tuned Models by Removing Label Bias in Foundation Models [75.9543301303586]
Foundation models like CLIP allow zero-shot transfer on various tasks without additional training data.
Fine-tuning and ensembling are also commonly adopted to better fit the downstream tasks.
However, we argue that prior work has overlooked the inherent biases in foundation models.
arXiv Detail & Related papers (2023-10-12T08:01:11Z) - Counterbalancing Teacher: Regularizing Batch Normalized Models for
Robustness [15.395021925719817]
Batch normalization (BN) is a technique for training deep neural networks that accelerates their convergence to reach higher accuracy.
We show that BN incentivizes the model to rely on low-variance features that are highly specific to the training (in-domain) data.
We propose Counterbalancing Teacher (CT) to enforce the student network's learning of robust representations.
arXiv Detail & Related papers (2022-07-04T16:16:24Z) - Test-time Batch Normalization [61.292862024903584]
Deep neural networks often suffer the data distribution shift between training and testing.
We revisit the batch normalization (BN) in the training process and reveal two key insights benefiting test-time optimization.
We propose a novel test-time BN layer design, GpreBN, which is optimized during testing by minimizing Entropy loss.
arXiv Detail & Related papers (2022-05-20T14:33:39Z) - RawlsGCN: Towards Rawlsian Difference Principle on Graph Convolutional
Network [102.27090022283208]
Graph Convolutional Network (GCN) plays pivotal roles in many real-world applications.
GCN often exhibits performance disparity with respect to node degrees, resulting in worse predictive accuracy for low-degree nodes.
We formulate the problem of mitigating the degree-related performance disparity in GCN from the perspective of the Rawlsian difference principle.
arXiv Detail & Related papers (2022-02-28T05:07:57Z) - Test-time Batch Statistics Calibration for Covariate Shift [66.7044675981449]
We propose to adapt the deep models to the novel environment during inference.
We present a general formulation $alpha$-BN to calibrate the batch statistics.
We also present a novel loss function to form a unified test time adaptation framework Core.
arXiv Detail & Related papers (2021-10-06T08:45:03Z) - Batch Group Normalization [45.03388237812212]
Batch Normalization (BN) performs well at medium and large batch sizes.
BN saturates at small/extreme large batch sizes due to noisy/confused statistic calculation.
BGN is proposed to solve the noisy/confused statistic calculation of BN at small/extreme large batch sizes.
arXiv Detail & Related papers (2020-12-04T18:57:52Z) - An Investigation into the Stochasticity of Batch Whitening [95.54842420166862]
This paper investigates the more general Batch Whitening (BW) operation.
We show that while various whitening transformations equivalently improve the conditioning, they show significantly different behaviors in discriminative scenarios and training Generative Adrial Networks (GAN)
Our proposed BW algorithm improves the residual networks by a significant margin on ImageNetversaity.
arXiv Detail & Related papers (2020-03-27T11:06:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.