Related papers: Convergence of Deep Neural Networks with General Activation Functions and Pooling

Convergence of Deep Neural Networks with General Activation Functions and Pooling

URL: http://arxiv.org/abs/2205.06570v1
Date: Fri, 13 May 2022 11:49:03 GMT
Title: Convergence of Deep Neural Networks with General Activation Functions and Pooling
Authors: Wentao Huang, Yuesheng Xu, Haizhang Zhang
Abstract summary: Convergence of deep neural networks is a fundamental issue in building the mathematical foundation for deep learning. We study the convergence of deep neural networks as the depth tends to infinity for two other activation functions: the leaky ReLU and the sigmoid function.
Score: 5.316908050163474
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Deep neural networks, as a powerful system to represent high dimensional complex functions, play a key role in deep learning. Convergence of deep neural networks is a fundamental issue in building the mathematical foundation for deep learning. We investigated the convergence of deep ReLU networks and deep convolutional neural networks in two recent researches (arXiv:2107.12530, 2109.13542). Only the Rectified Linear Unit (ReLU) activation was studied therein, and the important pooling strategy was not considered. In this current work, we study the convergence of deep neural networks as the depth tends to infinity for two other important activation functions: the leaky ReLU and the sigmoid function. Pooling will also be studied. As a result, we prove that the sufficient condition established in arXiv:2107.12530, 2109.13542 is still sufficient for the leaky ReLU networks. For contractive activation functions such as the sigmoid function, we establish a weaker sufficient condition for uniform convergence of deep neural networks.

Related papers

ReCA: A Parametric ReLU Composite Activation Function [0.0]
Activation functions have been shown to affect the performance of deep neural networks significantly. We propose a novel parametric activation function, ReCA, which has been shown to outperform all baselines on state-of-the-art datasets.
arXiv Detail & Related papers (2025-04-11T22:05:57Z)
Addressing caveats of neural persistence with deep graph persistence [54.424983583720675]
We find that the variance of network weights and spatial concentration of large weights are the main factors that impact neural persistence. We propose an extension of the filtration underlying neural persistence to the whole neural network instead of single layers. This yields our deep graph persistence measure, which implicitly incorporates persistent paths through the network and alleviates variance-related issues.
arXiv Detail & Related papers (2023-07-20T13:34:11Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
Neural Networks with Sparse Activation Induced by Large Bias: Tighter Analysis with Bias-Generalized NTK [86.45209429863858]
We study training one-hidden-layer ReLU networks in the neural tangent kernel (NTK) regime. We show that the neural networks possess a different limiting kernel which we call textitbias-generalized NTK We also study various properties of the neural networks with this new kernel.
arXiv Detail & Related papers (2023-01-01T02:11:39Z)
Rank Diminishing in Deep Neural Networks [71.03777954670323]
Rank of neural networks measures information flowing across layers. It is an instance of a key structural condition that applies across broad domains of machine learning. For neural networks, however, the intrinsic mechanism that yields low-rank structures remains vague and unclear.
arXiv Detail & Related papers (2022-06-13T12:03:32Z)
Optimal Learning Rates of Deep Convolutional Neural Networks: Additive Ridge Functions [19.762318115851617]
We consider the mean squared error analysis for deep convolutional neural networks. We show that, for additive ridge functions, convolutional neural networks followed by one fully connected layer with ReLU activation functions can reach optimal mini-max rates.
arXiv Detail & Related papers (2022-02-24T14:22:32Z)
Convergence of Deep Convolutional Neural Networks [2.5991265608180396]
Convergence of deep neural networks as the depth of the networks tends to infinity is fundamental in building the mathematical foundation for deep learning. We first study convergence of general ReLU networks with increasing widths and then apply the results obtained to deep convolutional neural networks.
arXiv Detail & Related papers (2021-09-28T07:48:17Z)
Towards Lower Bounds on the Depth of ReLU Neural Networks [7.355977594790584]
We investigate whether the class of exactly representable functions strictly increases by adding more layers. We settle an old conjecture about piecewise linear functions by Wang and Sun (2005) in the affirmative. We present upper bounds on the sizes of neural networks required to represent functions with logarithmic depth.
arXiv Detail & Related papers (2021-05-31T09:49:14Z)
Theoretical Analysis of the Advantage of Deepening Neural Networks [0.0]
It is important to know the expressivity of functions computable by deep neural networks. By the two criteria, we show that to increase layers is more effective than to increase units at each layer on improving the expressivity of deep neural networks.
arXiv Detail & Related papers (2020-09-24T04:10:50Z)
Learning Connectivity of Neural Networks from a Topological Perspective [80.35103711638548]
We propose a topological perspective to represent a network into a complete graph for analysis. By assigning learnable parameters to the edges which reflect the magnitude of connections, the learning process can be performed in a differentiable manner. This learning process is compatible with existing networks and owns adaptability to larger search spaces and different tasks.
arXiv Detail & Related papers (2020-08-19T04:53:31Z)
Modeling from Features: a Mean-field Framework for Over-parameterized Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs) In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit. We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
Rational neural networks [3.4376560669160394]
We consider neural networks with rational activation functions. We prove that rational neural networks approximate smooth functions more efficiently than ReLU networks with exponentially smaller depth.
arXiv Detail & Related papers (2020-04-04T10:36:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.