Batch Normalization Explained
- URL: http://arxiv.org/abs/2209.14778v1
- Date: Thu, 29 Sep 2022 13:41:27 GMT
- Title: Batch Normalization Explained
- Authors: Randall Balestriero, Richard G. Baraniuk
- Abstract summary: We show that batch normalization (BN) boosts DN learning and inference performance.
BN is an unsupervised learning technique that adapts the geometry of a DN's spline partition to match the data.
We also show that the variation of BN statistics between mini-batches introduces a dropout-like random perturbation to the partition boundaries.
- Score: 31.66311831317311
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: A critically important, ubiquitous, and yet poorly understood ingredient in
modern deep networks (DNs) is batch normalization (BN), which centers and
normalizes the feature maps. To date, only limited progress has been made
understanding why BN boosts DN learning and inference performance; work has
focused exclusively on showing that BN smooths a DN's loss landscape. In this
paper, we study BN theoretically from the perspective of function
approximation; we exploit the fact that most of today's state-of-the-art DNs
are continuous piecewise affine (CPA) splines that fit a predictor to the
training data via affine mappings defined over a partition of the input space
(the so-called "linear regions"). {\em We demonstrate that BN is an
unsupervised learning technique that -- independent of the DN's weights or
gradient-based learning -- adapts the geometry of a DN's spline partition to
match the data.} BN provides a "smart initialization" that boosts the
performance of DN learning, because it adapts even a DN initialized with random
weights to align its spline partition with the data. We also show that the
variation of BN statistics between mini-batches introduces a dropout-like
random perturbation to the partition boundaries and hence the decision boundary
for classification problems. This per mini-batch perturbation reduces
overfitting and improves generalization by increasing the margin between the
training samples and the decision boundary.
Related papers
- BN-SCAFFOLD: controlling the drift of Batch Normalization statistics in Federated Learning [2.563180814294141]
Federated Learning (FL) is gaining traction as a learning paradigm for training Machine Learning (ML) models in a decentralized way.
Batch Normalization (BN) is ubiquitous in Deep Neural Networks (DNN)
BN has been reported to hinder performance of DNNs in heterogeneous FL.
We introduce a unified theoretical framework for analyzing the convergence of variance reduction algorithms in the BN-DNN setting.
arXiv Detail & Related papers (2024-10-04T09:53:20Z) - Unified Batch Normalization: Identifying and Alleviating the Feature
Condensation in Batch Normalization and a Unified Framework [55.22949690864962]
Batch Normalization (BN) has become an essential technique in contemporary neural network design.
We propose a two-stage unified framework called Unified Batch Normalization (UBN)
UBN significantly enhances performance across different visual backbones and different vision tasks.
arXiv Detail & Related papers (2023-11-27T16:41:31Z) - Overcoming Recency Bias of Normalization Statistics in Continual
Learning: Balance and Adaptation [67.77048565738728]
Continual learning involves learning a sequence of tasks and balancing their knowledge appropriately.
We propose Adaptive Balance of BN (AdaB$2$N), which incorporates appropriately a Bayesian-based strategy to adapt task-wise contributions.
Our approach achieves significant performance gains across a wide range of benchmarks.
arXiv Detail & Related papers (2023-10-13T04:50:40Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Why Batch Normalization Damage Federated Learning on Non-IID Data? [34.06900591666005]
Federated learning (FL) involves training deep neural network (DNN) models at the network edge while protecting the privacy of the edge clients.
Batch normalization (BN) has been regarded as a simple and effective means to accelerate the training and improve the capability generalization.
Recent findings indicate that BN can significantly impair the performance of FL in the presence of non-i.i.d. data.
We present the first convergence analysis to show that under the non-i.i.d. data, the mismatch between the local and global statistical parameters in BN causes the gradient deviation between the local and global models
arXiv Detail & Related papers (2023-01-08T05:24:12Z) - Diagnosing Batch Normalization in Class Incremental Learning [39.70552266952221]
Batch normalization (BN) standardizes intermediate feature maps and has been widely validated to improve training stability and convergence.
We propose BN Tricks to address the issue by training a better feature extractor while eliminating classification bias.
We show that BN Tricks can bring significant performance gains to all adopted baselines.
arXiv Detail & Related papers (2022-02-16T12:38:43Z) - Batch Normalization Preconditioning for Neural Network Training [7.709342743709842]
Batch normalization (BN) is a popular and ubiquitous method in deep learning.
BN is not suitable for use with very small mini-batch sizes or online learning.
We propose a new method called Batch Normalization Preconditioning (BNP)
arXiv Detail & Related papers (2021-08-02T18:17:26Z) - Double Forward Propagation for Memorized Batch Normalization [68.34268180871416]
Batch Normalization (BN) has been a standard component in designing deep neural networks (DNNs)
We propose a memorized batch normalization (MBN) which considers multiple recent batches to obtain more accurate and robust statistics.
Compared to related methods, the proposed MBN exhibits consistent behaviors in both training and inference.
arXiv Detail & Related papers (2020-10-10T08:48:41Z) - PowerNorm: Rethinking Batch Normalization in Transformers [96.14956636022957]
normalization method for neural network (NN) models used in Natural Language Processing (NLP) is layer normalization (LN)
LN is preferred due to the empirical observation that a (naive/vanilla) use of BN leads to significant performance degradation for NLP tasks.
We propose Power Normalization (PN), a novel normalization scheme that resolves this issue.
arXiv Detail & Related papers (2020-03-17T17:50:26Z) - How Does BN Increase Collapsed Neural Network Filters? [34.886702335022015]
Filter collapse is common in deep neural networks (DNNs) with batch normalization (BN) and rectified linear activation functions (e.g. ReLU, Leaky ReLU)
We propose a simple yet effective approach named post-shifted BN (psBN), which has the same representation ability as BN while being able to automatically make BN parameters trainable again as they saturate during training.
arXiv Detail & Related papers (2020-01-30T09:00:08Z) - Towards Stabilizing Batch Statistics in Backward Propagation of Batch
Normalization [126.6252371899064]
Moving Average Batch Normalization (MABN) is a novel normalization method.
We show that MABN can completely restore the performance of vanilla BN in small batch cases.
Our experiments demonstrate the effectiveness of MABN in multiple computer vision tasks including ImageNet and COCO.
arXiv Detail & Related papers (2020-01-19T14:41:22Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.