Demystifying Batch Normalization in ReLU Networks: Equivalent Convex
Optimization Models and Implicit Regularization
- URL: http://arxiv.org/abs/2103.01499v1
- Date: Tue, 2 Mar 2021 06:36:31 GMT
- Title: Demystifying Batch Normalization in ReLU Networks: Equivalent Convex
Optimization Models and Implicit Regularization
- Authors: Tolga Ergen, Arda Sahiner, Batu Ozturkler, John Pauly, Morteza
Mardani, Mert Pilanci
- Abstract summary: We introduce an analytic framework based convex duality to obtain exact convex representations of weight-decay regularized ReLU networks with BN.
Our analyses also show that optimal layer weights can be obtained as simple closed-form formulas in the high-dimensional and/or CIFized regimes.
- Score: 29.411334761836958
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Batch Normalization (BN) is a commonly used technique to accelerate and
stabilize training of deep neural networks. Despite its empirical success, a
full theoretical understanding of BN is yet to be developed. In this work, we
analyze BN through the lens of convex optimization. We introduce an analytic
framework based on convex duality to obtain exact convex representations of
weight-decay regularized ReLU networks with BN, which can be trained in
polynomial-time. Our analyses also show that optimal layer weights can be
obtained as simple closed-form formulas in the high-dimensional and/or
overparameterized regimes. Furthermore, we find that Gradient Descent provides
an algorithmic bias effect on the standard non-convex BN network, and we design
an approach to explicitly encode this implicit regularization into the convex
objective. Experiments with CIFAR image classification highlight the
effectiveness of this explicit regularization for mimicking and substantially
improving the performance of standard BN networks.
Related papers
- The Convex Landscape of Neural Networks: Characterizing Global Optima
and Stationary Points via Lasso Models [75.33431791218302]
Deep Neural Network Network (DNN) models are used for programming purposes.
In this paper we examine the use of convex neural recovery models.
We show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
We also show that all the stationary non-dimensional objective objective can be characterized as the standard a global subsampled convex solvers program.
arXiv Detail & Related papers (2023-12-19T23:04:56Z) - Lipschitz Continuity Retained Binary Neural Network [52.17734681659175]
We introduce the Lipschitz continuity as the rigorous criteria to define the model robustness for BNN.
We then propose to retain the Lipschitz continuity as a regularization term to improve the model robustness.
Our experiments prove that our BNN-specific regularization method can effectively strengthen the robustness of BNN.
arXiv Detail & Related papers (2022-07-13T22:55:04Z) - Path Regularization: A Convexity and Sparsity Inducing Regularization
for Parallel ReLU Networks [75.33431791218302]
We study the training problem of deep neural networks and introduce an analytic approach to unveil hidden convexity in the optimization landscape.
We consider a deep parallel ReLU network architecture, which also includes standard deep networks and ResNets as its special cases.
arXiv Detail & Related papers (2021-10-18T18:00:36Z) - Explicit regularization and implicit bias in deep network classifiers
trained with the square loss [2.8935588665357077]
Deep ReLU networks trained with the square loss have been observed to perform well in classification tasks.
We show that convergence to a solution with the absolute minimum norm is expected when normalization techniques are used together with Weight Decay.
arXiv Detail & Related papers (2020-12-31T21:07:56Z) - Optimal Quantization for Batch Normalization in Neural Network
Deployments and Beyond [18.14282813812512]
Batch Normalization (BN) poses a challenge for Quantized Neural Networks (QNNs)
We propose a novel method to quantize BN by converting an affine transformation of two floating points to a fixed-point operation with shared quantized scale.
Our method is verified by experiments at layer level on CIFAR and ImageNet datasets.
arXiv Detail & Related papers (2020-08-30T09:33:29Z) - Optimization Theory for ReLU Neural Networks Trained with Normalization
Layers [82.61117235807606]
The success of deep neural networks in part due to the use of normalization layers.
Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
arXiv Detail & Related papers (2020-06-11T23:55:54Z) - Iterative Network for Image Super-Resolution [69.07361550998318]
Single image super-resolution (SISR) has been greatly revitalized by the recent development of convolutional neural networks (CNN)
This paper provides a new insight on conventional SISR algorithm, and proposes a substantially different approach relying on the iterative optimization.
A novel iterative super-resolution network (ISRN) is proposed on top of the iterative optimization.
arXiv Detail & Related papers (2020-05-20T11:11:47Z) - Layer-wise Conditioning Analysis in Exploring the Learning Dynamics of
DNNs [115.35745188028169]
We extend conditioning analysis to deep neural networks (DNNs) in order to investigate their learning dynamics.
We show that batch normalization (BN) can stabilize the training, but sometimes result in the false impression of a local minimum.
We experimentally observe that BN can improve the layer-wise conditioning of the optimization problem.
arXiv Detail & Related papers (2020-02-25T11:40:27Z) - Provable Benefit of Orthogonal Initialization in Optimizing Deep Linear
Networks [39.856439772974454]
We show that the width needed for efficient convergence to a global minimum is independent of the depth.
Our results suggest an explanation for the recent empirical successes found by initializing very deep non-linear networks according to the principle of dynamical isometry.
arXiv Detail & Related papers (2020-01-16T18:48:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.