Related papers: Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory

Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory

URL: http://arxiv.org/abs/2002.10674v2
Date: Tue, 1 Jun 2021 11:22:32 GMT
Title: Separating the Effects of Batch Normalization on CNN Training Speed and Stability Using Classical Adaptive Filter Theory
Authors: Elaina Chai, Mert Pilanci, Boris Murmann
Abstract summary: Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability. This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm.
Score: 40.55789598448379
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Batch Normalization (BatchNorm) is commonly used in Convolutional Neural Networks (CNNs) to improve training speed and stability. However, there is still limited consensus on why this technique is effective. This paper uses concepts from the traditional adaptive filter domain to provide insight into the dynamics and inner workings of BatchNorm. First, we show that the convolution weight updates have natural modes whose stability and convergence speed are tied to the eigenvalues of the input autocorrelation matrices, which are controlled by BatchNorm through the convolution layers' channel-wise structure. Furthermore, our experiments demonstrate that the speed and stability benefits are distinct effects. At low learning rates, it is BatchNorm's amplification of the smallest eigenvalues that improves convergence speed, while at high learning rates, it is BatchNorm's suppression of the largest eigenvalues that ensures stability. Lastly, we prove that in the first training step, when normalization is needed most, BatchNorm satisfies the same optimization as Normalized Least Mean Square (NLMS), while it continues to approximate this condition in subsequent steps. The analyses provided in this paper lay the groundwork for gaining further insight into the operation of modern neural network structures using adaptive filter theory.

Related papers

Efficient Bound of Lipschitz Constant for Convolutional Layers by Gram Iteration [122.51142131506639]
We introduce a precise, fast, and differentiable upper bound for the spectral norm of convolutional layers using circulant matrix theory. We show through a comprehensive set of experiments that our approach outperforms other state-of-the-art methods in terms of precision, computational cost, and scalability. It proves highly effective for the Lipschitz regularization of convolutional neural networks, with competitive results against concurrent approaches.
arXiv Detail & Related papers (2023-05-25T15:32:21Z)
An Empirical Analysis of the Shift and Scale Parameters in BatchNorm [3.198144010381572]
Batch Normalization (BatchNorm) is a technique that improves the training of deep neural networks. This paper examines the relative contribution to the success of BatchNorm of the normalization step.
arXiv Detail & Related papers (2023-03-22T12:41:12Z)
Towards Practical Control of Singular Values of Convolutional Layers [65.25070864775793]
Convolutional neural networks (CNNs) are easy to train, but their essential properties, such as generalization error and adversarial robustness, are hard to control. Recent research demonstrated that singular values of convolutional layers significantly affect such elusive properties. We offer a principled approach to alleviating constraints of the prior art at the expense of an insignificant reduction in layer expressivity.
arXiv Detail & Related papers (2022-11-24T19:09:44Z)
A Stable, Fast, and Fully Automatic Learning Algorithm for Predictive Coding Networks [65.34977803841007]
Predictive coding networks are neuroscience-inspired models with roots in both Bayesian statistics and neuroscience. We show how by simply changing the temporal scheduling of the update rule for the synaptic weights leads to an algorithm that is much more efficient and stable than the original one.
arXiv Detail & Related papers (2022-11-16T00:11:04Z)
Sample-Then-Optimize Batch Neural Thompson Sampling [50.800944138278474]
We introduce two algorithms for black-box optimization based on the Thompson sampling (TS) policy. To choose an input query, we only need to train an NN and then choose the query by maximizing the trained NN. Our algorithms sidestep the need to invert the large parameter matrix yet still preserve the validity of the TS policy.
arXiv Detail & Related papers (2022-10-13T09:01:58Z)
TO-FLOW: Efficient Continuous Normalizing Flows with Temporal Optimization adjoint with Moving Speed [12.168241245313164]
Continuous normalizing flows (CNFs) construct invertible mappings between an arbitrary complex distribution and an isotropic Gaussian distribution. It has not been tractable on large datasets due to the incremental complexity of the neural ODE training. In this paper, a temporal optimization is proposed by optimizing the evolutionary time for forward propagation of the neural ODE training.
arXiv Detail & Related papers (2022-03-19T14:56:41Z)
Demystifying Batch Normalization in ReLU Networks: Equivalent Convex Optimization Models and Implicit Regularization [29.411334761836958]
We introduce an analytic framework based convex duality to obtain exact convex representations of weight-decay regularized ReLU networks with BN. Our analyses also show that optimal layer weights can be obtained as simple closed-form formulas in the high-dimensional and/or CIFized regimes.
arXiv Detail & Related papers (2021-03-02T06:36:31Z)
Normalized Convolutional Neural Network [3.9686028140278897]
We introduce a Normalized Convolutional Neural Layer, a novel approach to normalization in convolutional networks. This layer normalizes the rows of the im2col matrix during convolution, making it inherently adaptive to sliced inputs and better aligned with kernel structures.
arXiv Detail & Related papers (2020-05-11T17:20:26Z)
Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix. Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.