How Does BN Increase Collapsed Neural Network Filters?
- URL: http://arxiv.org/abs/2001.11216v2
- Date: Fri, 31 Jan 2020 01:31:33 GMT
- Title: How Does BN Increase Collapsed Neural Network Filters?
- Authors: Sheng Zhou, Xinjiang Wang, Ping Luo, Litong Feng, Wenjie Li, Wei Zhang
- Abstract summary: Filter collapse is common in deep neural networks (DNNs) with batch normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky ReLU)
We propose a simple yet effective approach named post-shifted BN (psBN), which has the same representation ability as BN while being able to automatically make BN parameters trainable again as they saturate during training.
- Score: 34.886702335022015
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Improving sparsity of deep neural networks (DNNs) is essential for network
compression and has drawn much attention. In this work, we disclose a harmful
sparsifying process called filter collapse, which is common in DNNs with batch
normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky
ReLU). It occurs even without explicit sparsity-inducing regularizations such
as $L_1$. This phenomenon is caused by the normalization effect of BN, which
induces a non-trainable region in the parameter space and reduces the network
capacity as a result. This phenomenon becomes more prominent when the network
is trained with large learning rates (LR) or adaptive LR schedulers, and when
the network is finetuned. We analytically prove that the parameters of BN tend
to become sparser during SGD updates with high gradient noise and that the
sparsifying probability is proportional to the square of learning rate and
inversely proportional to the square of the scale parameter of BN. To prevent
the undesirable collapsed filters, we propose a simple yet effective approach
named post-shifted BN (psBN), which has the same representation ability as BN
while being able to automatically make BN parameters trainable again as they
saturate during training. With psBN, we can recover collapsed filters and
increase the model performance in various tasks such as classification on
CIFAR-10 and object detection on MS-COCO2017.
Related papers
- BN-SCAFFOLD: controlling the drift of Batch Normalization statistics in Federated Learning [2.563180814294141]
Federated Learning (FL) is gaining traction as a learning paradigm for training Machine Learning (ML) models in a decentralized way.
Batch Normalization (BN) is ubiquitous in Deep Neural Networks (DNN)
BN has been reported to hinder performance of DNNs in heterogeneous FL.
We introduce a unified theoretical framework for analyzing the convergence of variance reduction algorithms in the BN-DNN setting.
arXiv Detail & Related papers (2024-10-04T09:53:20Z) - An Adaptive Batch Normalization in Deep Learning [0.0]
Batch Normalization (BN) is a way to accelerate and stabilize training in deep convolutional neural networks.
We propose a threshold-based adaptive BN approach that separates the data that requires the BN and data that does not require it.
arXiv Detail & Related papers (2022-11-03T12:12:56Z) - Batch Normalization Explained [31.66311831317311]
We show that batch normalization (BN) boosts DN learning and inference performance.
BN is an unsupervised learning technique that adapts the geometry of a DN's spline partition to match the data.
We also show that the variation of BN statistics between mini-batches introduces a dropout-like random perturbation to the partition boundaries.
arXiv Detail & Related papers (2022-09-29T13:41:27Z) - Batch Normalization Tells You Which Filter is Important [49.903610684578716]
We propose a simple yet effective filter pruning method by evaluating the importance of each filter based on the BN parameters of pre-trained CNNs.
The experimental results on CIFAR-10 and ImageNet demonstrate that the proposed method can achieve outstanding performance.
arXiv Detail & Related papers (2021-12-02T12:04:59Z) - Batch Normalization Preconditioning for Neural Network Training [7.709342743709842]
Batch normalization (BN) is a popular and ubiquitous method in deep learning.
BN is not suitable for use with very small mini-batch sizes or online learning.
We propose a new method called Batch Normalization Preconditioning (BNP)
arXiv Detail & Related papers (2021-08-02T18:17:26Z) - Manipulating Identical Filter Redundancy for Efficient Pruning on Deep
and Complicated CNN [126.88224745942456]
We propose a novel Centripetal SGD (C-SGD) to make some filters identical, resulting in ideal redundancy patterns.
C-SGD delivers better performance because the redundancy is better organized, compared to the existing methods.
arXiv Detail & Related papers (2021-07-30T06:18:19Z) - "BNN - BN = ?": Training Binary Neural Networks without Batch
Normalization [92.23297927690149]
Batch normalization (BN) is a key facilitator and considered essential for state-of-the-art binary neural networks (BNN)
We extend their framework to training BNNs, and for the first time demonstrate that BNs can be completed removed from BNN training and inference regimes.
arXiv Detail & Related papers (2021-04-16T16:46:57Z) - MimicNorm: Weight Mean and Last BN Layer Mimic the Dynamic of Batch
Normalization [60.36100335878855]
We propose a novel normalization method, named MimicNorm, to improve the convergence and efficiency in network training.
We leverage the neural kernel (NTK) theory to prove that our weight mean operation whitens activations and transits network into the chaotic regime like BN layer.
MimicNorm achieves similar accuracy for various network structures, including ResNets and lightweight networks like ShuffleNet, with a reduction of about 20% memory consumption.
arXiv Detail & Related papers (2020-10-19T07:42:41Z) - PowerNorm: Rethinking Batch Normalization in Transformers [96.14956636022957]
normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN)
LN is preferred due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks.
We propose Power Normalization (PN), a novel normalization scheme that resolves this issue.
arXiv Detail & Related papers (2020-03-17T17:50:26Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.