Convergence of Deep ReLU Networks
- URL: http://arxiv.org/abs/2107.12530v1
- Date: Tue, 27 Jul 2021 00:33:53 GMT
- Title: Convergence of Deep ReLU Networks
- Authors: Yuesheng Xu and Haizhang Zhang
- Abstract summary: We explore convergence of deep neural networks with the popular ReLU activation function.
We identify the convergence of the ReLU networks as convergence of a class of infinite products of matrices.
These results provide mathematical insights to the design strategy of the well-known deep residual networks in image classification.
- Score: 2.5991265608180396
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We explore convergence of deep neural networks with the popular ReLU
activation function, as the depth of the networks tends to infinity. To this
end, we introduce the notion of activation domains and activation matrices of a
ReLU network. By replacing applications of the ReLU activation function by
multiplications with activation matrices on activation domains, we obtain an
explicit expression of the ReLU network. We then identify the convergence of
the ReLU networks as convergence of a class of infinite products of matrices.
Sufficient and necessary conditions for convergence of these infinite products
of matrices are studied. As a result, we establish necessary conditions for
ReLU networks to converge that the sequence of weight matrices converges to the
identity matrix and the sequence of the bias vectors converges to zero as the
depth of ReLU networks increases to infinity. Moreover, we obtain sufficient
conditions in terms of the weight matrices and bias vectors at hidden layers
for pointwise convergence of deep ReLU networks. These results provide
mathematical insights to the design strategy of the well-known deep residual
networks in image classification.
Related papers
- Generalization of Scaled Deep ResNets in the Mean-Field Regime [55.77054255101667]
We investigate emphscaled ResNet in the limit of infinitely deep and wide neural networks.
Our results offer new insights into the generalization ability of deep ResNet beyond the lazy training regime.
arXiv Detail & Related papers (2024-03-14T21:48:00Z) - A Library of Mirrors: Deep Neural Nets in Low Dimensions are Convex Lasso Models with Reflection Features [54.83898311047626]
We consider neural networks with piecewise linear activations ranging from 2 to an arbitrary but finite number of layers.
We first show that two-layer networks with piecewise linear activations are Lasso models using a discrete dictionary of ramp depths.
arXiv Detail & Related papers (2024-03-02T00:33:45Z) - Optimization Guarantees of Unfolded ISTA and ADMM Networks With Smooth
Soft-Thresholding [57.71603937699949]
We study optimization guarantees, i.e., achieving near-zero training loss with the increase in the number of learning epochs.
We show that the threshold on the number of training samples increases with the increase in the network width.
arXiv Detail & Related papers (2023-09-12T13:03:47Z) - Uniform Convergence of Deep Neural Networks with Lipschitz Continuous
Activation Functions and Variable Widths [3.0069322256338906]
We consider deep neural networks with a Lipschitz continuous activation function and with weight matrices of variable widths.
In particular, as convolutional neural networks are special deep neural networks with weight matrices of increasing widths, we put forward conditions on the mask sequence.
The Lipschitz continuity assumption on the activation functions allows us to include in our theory most of commonly used activation functions in applications.
arXiv Detail & Related papers (2023-06-02T17:07:12Z) - On the Effective Number of Linear Regions in Shallow Univariate ReLU
Networks: Convergence Guarantees and Implicit Bias [50.84569563188485]
We show that gradient flow converges in direction when labels are determined by the sign of a target network with $r$ neurons.
Our result may already hold for mild over- parameterization, where the width is $tildemathcalO(r)$ and independent of the sample size.
arXiv Detail & Related papers (2022-05-18T16:57:10Z) - Convergence Analysis of Deep Residual Networks [3.274290296343038]
Deep Residual Networks (ResNets) are of particular importance because they demonstrated great usefulness in computer vision.
We aim at characterizing the convergence of deep ResNets as the depth tends to infinity in terms of the parameters of the networks.
arXiv Detail & Related papers (2022-05-13T11:53:09Z) - Convergence of Deep Neural Networks with General Activation Functions
and Pooling [5.316908050163474]
Convergence of deep neural networks is a fundamental issue in building the mathematical foundation for deep learning.
We study the convergence of deep neural networks as the depth tends to infinity for two other activation functions: the leaky ReLU and the sigmoid function.
arXiv Detail & Related papers (2022-05-13T11:49:03Z) - Convergence of Deep Convolutional Neural Networks [2.5991265608180396]
Convergence of deep neural networks as the depth of the networks tends to infinity is fundamental in building the mathematical foundation for deep learning.
We first study convergence of general ReLU networks with increasing widths and then apply the results obtained to deep convolutional neural networks.
arXiv Detail & Related papers (2021-09-28T07:48:17Z) - When does gradient descent with logistic loss interpolate using deep
networks with smoothed ReLU activations? [51.1848572349154]
We establish conditions under which gradient descent applied to fixed-width deep networks drives the logistic loss to zero.
Our analysis applies for smoothed approximations to the ReLU, such as Swish and the Huberized ReLU.
arXiv Detail & Related papers (2021-02-09T18:04:37Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Deep Neural Networks with Trainable Activations and Controlled Lipschitz
Constant [26.22495169129119]
We introduce a variational framework to learn the activation functions of deep neural networks.
Our aim is to increase the capacity of the network while controlling an upper-bound of the Lipschitz constant.
We numerically compare our scheme with standard ReLU network and its variations, PReLU and LeakyReLU.
arXiv Detail & Related papers (2020-01-17T12:32:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.