Separating the Effects of Batch Normalization on CNN Training Speed and
Stability Using Classical Adaptive Filter Theory
- URL: http://arxiv.org/abs/2002.10674v2
- Date: Tue, 1 Jun 2021 11:22:32 GMT
- Title: Separating the Effects of Batch Normalization on CNN Training Speed and
Stability Using Classical Adaptive Filter Theory
- Authors: Elaina Chai, Mert Pilanci, Boris Murmann
- Abstract summary: Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability.
This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm.
- Score: 40.55789598448379
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Batch Normalization (BatchNorm) is commonly used in Convolutional Neural
Networks (CNNs) to improve training speed and stability. However, there is
still limited consensus on why this technique is effective. This paper uses
concepts from the traditional adaptive filter domain to provide insight into
the dynamics and inner workings of BatchNorm. First, we show that the
convolution weight updates have natural modes whose stability and convergence
speed are tied to the eigenvalues of the input autocorrelation matrices, which
are controlled by BatchNorm through the convolution layers' channel-wise
structure. Furthermore, our experiments demonstrate that the speed and
stability benefits are distinct effects. At low learning rates, it is
BatchNorm's amplification of the smallest eigenvalues that improves convergence
speed, while at high learning rates, it is BatchNorm's suppression of the
largest eigenvalues that ensures stability. Lastly, we prove that in the first
training step, when normalization is needed most, BatchNorm satisfies the same
optimization as Normalized Least Mean Square (NLMS), while it continues to
approximate this condition in subsequent steps. The analyses provided in this
paper lay the groundwork for gaining further insight into the operation of
modern neural network structures using adaptive filter theory.
Related papers
- Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram
Iteration [122.51142131506639]
We introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory.
We show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability.
It proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
arXiv Detail & Related papers (2023-05-25T15:32:21Z) - An Empirical Analysis of the Shift and Scale Parameters in BatchNorm [3.198144010381572]
Batch Normalization (BatchNorm) is a technique that improves the training of deep neural networks.
This paper examines the relative contribution to the success of BatchNorm of the normalization step.
arXiv Detail & Related papers (2023-03-22T12:41:12Z) - Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control.
Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties.
We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z) - A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive
Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience.
We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z) - Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy.
To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN.
Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z) - TO-FLOW: Efficient Continuous Normalizing Flows with Temporal
Optimization adjoint with Moving Speed [12.168241245313164]
Continuous normalizing flows (CNFs) construct invertible mappings between an arbitrary complex distribution and an isotropic Gaussian distribution.
It has not been tractable on large datasets due to the incremental complexity of the neural ODE training.
In this paper, a temporal optimization is proposed by optimizing the evolutionary time for forward propagation of the neural ODE training.
arXiv Detail & Related papers (2022-03-19T14:56:41Z) - Demystifying Batch Normalization in ReLU Networks: Equivalent Convex
Optimization Models and Implicit Regularization [29.411334761836958]
We introduce an analytic framework based convex duality to obtain exact convex representations of weight-decay regularized ReLU networks with BN.
Our analyses also show that optimal layer weights can be obtained as simple closed-form formulas in the high-dimensional and/or CIFized regimes.
arXiv Detail & Related papers (2021-03-02T06:36:31Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.