Convergence of Deep Neural Networks with General Activation Functions
and Pooling
- URL: http://arxiv.org/abs/2205.06570v1
- Date: Fri, 13 May 2022 11:49:03 GMT
- Title: Convergence of Deep Neural Networks with General Activation Functions
and Pooling
- Authors: Wentao Huang, Yuesheng Xu, Haizhang Zhang
- Abstract summary: Convergence of deep neural networks is a fundamental issue in building the mathematical foundation for deep learning.
We study the convergence of deep neural networks as the depth tends to infinity for two other activation functions: the leaky ReLU and the sigmoid function.
- Score: 5.316908050163474
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Deep neural networks, as a powerful system to represent high dimensional
complex functions, play a key role in deep learning. Convergence of deep neural
networks is a fundamental issue in building the mathematical foundation for
deep learning. We investigated the convergence of deep ReLU networks and deep
convolutional neural networks in two recent researches (arXiv:2107.12530,
2109.13542). Only the Rectified Linear Unit (ReLU) activation was studied
therein, and the important pooling strategy was not considered. In this current
work, we study the convergence of deep neural networks as the depth tends to
infinity for two other important activation functions: the leaky ReLU and the
sigmoid function. Pooling will also be studied. As a result, we prove that the
sufficient condition established in arXiv:2107.12530, 2109.13542 is still
sufficient for the leaky ReLU networks. For contractive activation functions
such as the sigmoid function, we establish a weaker sufficient condition for
uniform convergence of deep neural networks.
Related papers
- Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence.
We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers.
This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime.
We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK
We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z) - Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers.
It is an instance of a key structural condition that applies across broad domains of machine learning.
For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z) - Optimal Learning Rates of Deep Convolutional Neural Networks: Additive
Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks.
We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z) - Convergence of Deep Convolutional Neural Networks [2.5991265608180396]
Convergence of deep neural networks as the depth of the networks tends to infinity is fundamental in building the mathematical foundation for deep learning.
We first study convergence of general ReLU networks with increasing widths and then apply the results obtained to deep convolutional neural networks.
arXiv Detail & Related papers (2021-09-28T07:48:17Z) - Towards Lower Bounds on the Depth of ReLU Neural Networks [7.355977594790584]
We investigate whether the class of exactly representable functions strictly increases by adding more layers.
We settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative.
We present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.
arXiv Detail & Related papers (2021-05-31T09:49:14Z) - Theoretical Analysis of the Advantage of Deepening Neural Networks [0.0]
It is important to know the expressivity of functions computable by deep neural networks.
By the two criteria, we show that to increase layers is more effective than to increase units at each layer on improving the expressivity of deep neural networks.
arXiv Detail & Related papers (2020-09-24T04:10:50Z) - Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis.
By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner.
This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z) - Rational neural networks [3.4376560669160394]
We consider neural networks with rational activation functions.
We prove that rational neural networks approximate smooth functions more efficiently than ReLU networks with exponentially smaller depth.
arXiv Detail & Related papers (2020-04-04T10:36:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.