Related papers: Convergence of Deep ReLU Networks

Convergence of Deep ReLU Networks

URL: http://arxiv.org/abs/2107.12530v1
Date: Tue, 27 Jul 2021 00:33:53 GMT
Title: Convergence of Deep ReLU Networks
Authors: Yuesheng Xu and Haizhang Zhang
Abstract summary: We explore convergence of deep neural networks with the popular ReLU activation function. We identify the convergence of the ReLU networks as convergence of a class of infinite products of matrices. These results provide mathematical insights to the design strategy of the well-known deep residual networks in image classification.
Score: 2.5991265608180396
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We explore convergence of deep neural networks with the popular ReLU activation function, as the depth of the networks tends to infinity. To this end, we introduce the notion of activation domains and activation matrices of a ReLU network. By replacing applications of the ReLU activation function by multiplications with activation matrices on activation domains, we obtain an explicit expression of the ReLU network. We then identify the convergence of the ReLU networks as convergence of a class of infinite products of matrices. Sufficient and necessary conditions for convergence of these infinite products of matrices are studied. As a result, we establish necessary conditions for ReLU networks to converge that the sequence of weight matrices converges to the identity matrix and the sequence of the bias vectors converges to zero as the depth of ReLU networks increases to infinity. Moreover, we obtain sufficient conditions in terms of the weight matrices and bias vectors at hidden layers for pointwise convergence of deep ReLU networks. These results provide mathematical insights to the design strategy of the well-known deep residual networks in image classification.

Related papers

Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks. Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z)
A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features [54.83898311047626]
We consider neural networks with piecewise linear activations ranging from 2 to an arbitrary but finite number of layers. We first show that two-layer networks with piecewise linear activations are Lasso models using a discrete dictionary of ramp depths.
arXiv Detail & Related papers (2024-03-02T00:33:45Z)
Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs. We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z)
Uniform Convergence of Deep Neural Networks with Lipschitz Continuous Activation Functions and Variable Widths [3.0069322256338906]
We consider deep neural networks with a Lipschitz continuous activation function and with weight matrices of variable widths. In particular, as convolutional neural networks are special deep neural networks with weight matrices of increasing widths, we put forward conditions on the mask sequence. The Lipschitz continuity assumption on the activation functions allows us to include in our theory most of commonly used activation functions in applications.
arXiv Detail & Related papers (2023-06-02T17:07:12Z)
On the Effective Number of Linear Regions in Shallow Univariate ReLU Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons. Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z)
Convergence Analysis of Deep Residual Networks [3.274290296343038]
Deep Residual Networks (ResNets) are of particular importance because they demonstrated great usefulness in computer vision. We aim at characterizing the convergence of deep ResNets as the depth tends to infinity in terms of the parameters of the networks.
arXiv Detail & Related papers (2022-05-13T11:53:09Z)
Convergence of Deep Neural Networks with General Activation Functions and Pooling [5.316908050163474]
Convergence of deep neural networks is a fundamental issue in building the mathematical foundation for deep learning. We study the convergence of deep neural networks as the depth tends to infinity for two other activation functions: the leaky ReLU and the sigmoid function.
arXiv Detail & Related papers (2022-05-13T11:49:03Z)
Convergence of Deep Convolutional Neural Networks [2.5991265608180396]
Convergence of deep neural networks as the depth of the networks tends to infinity is fundamental in building the mathematical foundation for deep learning. We first study convergence of general ReLU networks with increasing widths and then apply the results obtained to deep convolutional neural networks.
arXiv Detail & Related papers (2021-09-28T07:48:17Z)
When does gradient descent with logistic loss interpolate using deep networks with smoothed ReLU activations? [51.1848572349154]
We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero. Our analysis applies for smoothed approximations to the ReLU, such as Swish and the Huberized ReLU.
arXiv Detail & Related papers (2021-02-09T18:04:37Z)
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
Deep Neural Networks with Trainable Activations and Controlled Lipschitz Constant [26.22495169129119]
We introduce a variational framework to learn the activation functions of deep neural networks. Our aim is to increase the capacity of the network while controlling an upper-bound of the Lipschitz constant. We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU.
arXiv Detail & Related papers (2020-01-17T12:32:55Z)

This list is automatically generated from the titles and abstracts of the papers in this site.