Related papers: Sandwich Batch Normalization

Sandwich Batch Normalization

URL: http://arxiv.org/abs/2102.11382v1
Date: Mon, 22 Feb 2021 22:09:43 GMT
Title: Sandwich Batch Normalization
Authors: Xinyu Gong, Wuyang Chen, Tianlong Chen and Zhangyang Wang
Abstract summary: We present Sandwich Batch Normalization (SaBN), an easy improvement of Batch Normalization (BN) with only a few lines of code changes. Our SaBN factorizes the BN affine layer into one shared sandwich affine layer, cascaded by several parallel independent affine layers. We demonstrate the prevailing effectiveness of SaBN as a drop-in replacement in four tasks.
Score: 96.2529041037824
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We present Sandwich Batch Normalization (SaBN), an embarrassingly easy improvement of Batch Normalization (BN) with only a few lines of code changes. SaBN is motivated by addressing the inherent feature distribution heterogeneity that one can be identified in many tasks, which can arise from data heterogeneity (multiple input domains) or model heterogeneity (dynamic architectures, model conditioning, etc.). Our SaBN factorizes the BN affine layer into one shared sandwich affine layer, cascaded by several parallel independent affine layers. Concrete analysis reveals that, during optimization, SaBN promotes balanced gradient norms while still preserving diverse gradient directions: a property that many application tasks seem to favor. We demonstrate the prevailing effectiveness of SaBN as a drop-in replacement in four tasks: $\textbf{conditional image generation}$, $\textbf{neural architecture search}$ (NAS), $\textbf{adversarial training}$, and $\textbf{arbitrary style transfer}$. Leveraging SaBN immediately achieves better Inception Score and FID on CIFAR-10 and ImageNet conditional image generation with three state-of-the-art GANs; boosts the performance of a state-of-the-art weight-sharing NAS algorithm significantly on NAS-Bench-201; substantially improves the robust and standard accuracies for adversarial defense; and produces superior arbitrary stylized results. We also provide visualizations and analysis to help understand why SaBN works. Codes are available at https://github.com/VITA-Group/Sandwich-Batch-Normalization.

Related papers

Which Frequencies do CNNs Need? Emergent Bottleneck Structure in Feature Learning [12.351756386062291]
We describe the emergence of a Convolution Bottleneck structure in CNNs. We define the CBN rank, which describes the number and type of frequencies that are kept inside the bottleneck. We show that any network with almost optimal parameter norm will exhibit a CBN structure in both the weights.
arXiv Detail & Related papers (2024-02-12T19:18:50Z)
Unified Batch Normalization: Identifying and Alleviating the Feature Condensation in Batch Normalization and a Unified Framework [55.22949690864962]
Batch Normalization (BN) has become an essential technique in contemporary neural network design. We propose a two-stage unified framework called Unified Batch Normalization (UBN) UBN significantly enhances performance across different visual backbones and different vision tasks.
arXiv Detail & Related papers (2023-11-27T16:41:31Z)
Patch-aware Batch Normalization for Improving Cross-domain Robustness [55.06956781674986]
Cross-domain tasks present a challenge in which the model's performance will degrade when the training set and the test set follow different distributions. We propose a novel method called patch-aware batch normalization (PBN) By exploiting the differences between local patches of an image, our proposed PBN can effectively enhance the robustness of the model's parameters.
arXiv Detail & Related papers (2023-04-06T03:25:42Z)
Diagnosing Batch Normalization in Class Incremental Learning [39.70552266952221]
Batch normalization (BN) standardizes intermediate feature maps and has been widely validated to improve training stability and convergence. We propose BN Tricks to address the issue by training a better feature extractor while eliminating classification bias. We show that BN Tricks can bring significant performance gains to all adopted baselines.
arXiv Detail & Related papers (2022-02-16T12:38:43Z)
Rebalancing Batch Normalization for Exemplar-based Class-Incremental Learning [23.621259845287824]
Batch Normalization (BN) has been extensively studied for neural nets in various computer vision tasks. We develop a new update patch for BN, particularly tailored for the exemplar-based class-incremental learning (CIL)
arXiv Detail & Related papers (2022-01-29T11:03:03Z)
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization [92.23297927690149]
Batch normalization (BN) is a key facilitator and considered essential for state-of-the-art binary neural networks (BNN) We extend their framework to training BNNs, and for the first time demonstrate that BNs can be completed removed from BNN training and inference regimes.
arXiv Detail & Related papers (2021-04-16T16:46:57Z)
MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training. We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer. MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z)
Towards Stabilizing Batch Statistics in Backward Propagation of Batch Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method. We show that MABN can completely restore the performance of vanilla BN in small batch cases. Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.