Optimization Theory for ReLU Neural Networks Trained with Normalization
Layers
- URL: http://arxiv.org/abs/2006.06878v1
- Date: Thu, 11 Jun 2020 23:55:54 GMT
- Title: Optimization Theory for ReLU Neural Networks Trained with Normalization
Layers
- Authors: Yonatan Dukler, Quanquan Gu, Guido Mont\'ufar
- Abstract summary: The success of deep neural networks in part due to the use of normalization layers.
Our analysis shows how the introduction of normalization changes the landscape and can enable faster activation.
- Score: 82.61117235807606
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The success of deep neural networks is in part due to the use of
normalization layers. Normalization layers like Batch Normalization, Layer
Normalization and Weight Normalization are ubiquitous in practice, as they
improve generalization performance and speed up training significantly.
Nonetheless, the vast majority of current deep learning theory and non-convex
optimization literature focuses on the un-normalized setting, where the
functions under consideration do not exhibit the properties of commonly
normalized neural networks. In this paper, we bridge this gap by giving the
first global convergence result for two-layer neural networks with ReLU
activations trained with a normalization layer, namely Weight Normalization.
Our analysis shows how the introduction of normalization layers changes the
optimization landscape and can enable faster convergence as compared with
un-normalized neural networks.
Related papers
- Unsupervised Adaptive Normalization [0.07499722271664146]
Unsupervised Adaptive Normalization (UAN) is an innovative algorithm that seamlessly integrates clustering for normalization with deep neural network learning.
UAN outperforms the classical methods by adapting to the target task and is effective in classification, and domain adaptation.
arXiv Detail & Related papers (2024-09-07T08:14:11Z) - Towards the Spectral bias Alleviation by Normalizations in Coordinate Networks [20.135740969953723]
Representing signals using coordinate networks dominates the area of inverse problems recently.
There exists an issue of spectral bias in coordinate networks, limiting the capacity to learn high-frequency components.
We find that, this pathological distribution could be improved using classical normalization techniques.
arXiv Detail & Related papers (2024-07-25T07:45:28Z) - Convergence Analysis for Learning Orthonormal Deep Linear Neural
Networks [27.29463801531576]
We provide convergence analysis for training orthonormal deep linear neural networks.
Our results shed light on how increasing the number of hidden layers can impact the convergence speed.
arXiv Detail & Related papers (2023-11-24T18:46:54Z) - Normalization-Equivariant Neural Networks with Application to Image
Denoising [3.591122855617648]
We propose a methodology for adapting existing neural networks so that normalization-equivariance holds by design.
Our main claim is that not only ordinary convolutional layers, but also all activation functions, should be completely removed from neural networks.
Experimental results in image denoising show that normalization-equivariant neural networks, in addition to their better conditioning, also provide much better generalization across noise levels.
arXiv Detail & Related papers (2023-06-08T08:42:08Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Subquadratic Overparameterization for Shallow Neural Networks [60.721751363271146]
We provide an analytical framework that allows us to adopt standard neural training strategies.
We achieve the desiderata viaak-Lojasiewicz, smoothness, and standard assumptions.
arXiv Detail & Related papers (2021-11-02T20:24:01Z) - Backward Gradient Normalization in Deep Neural Networks [68.8204255655161]
We introduce a new technique for gradient normalization during neural network training.
The gradients are rescaled during the backward pass using normalization layers introduced at certain points within the network architecture.
Results on tests with very deep neural networks show that the new technique can do an effective control of the gradient norm.
arXiv Detail & Related papers (2021-06-17T13:24:43Z) - Normalization Techniques in Training DNNs: Methodology, Analysis and
Application [111.82265258916397]
Normalization techniques are essential for accelerating the training and improving the generalization of deep neural networks (DNNs)
This paper reviews and comments on the past, present and future of normalization methods in the context of training.
arXiv Detail & Related papers (2020-09-27T13:06:52Z) - Revisiting Initialization of Neural Networks [72.24615341588846]
We propose a rigorous estimation of the global curvature of weights across layers by approximating and controlling the norm of their Hessian matrix.
Our experiments on Word2Vec and the MNIST/CIFAR image classification tasks confirm that tracking the Hessian norm is a useful diagnostic tool.
arXiv Detail & Related papers (2020-04-20T18:12:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.