On the asymptotics of wide networks with polynomial activations
- URL: http://arxiv.org/abs/2006.06687v1
- Date: Thu, 11 Jun 2020 18:00:01 GMT
- Title: On the asymptotics of wide networks with polynomial activations
- Authors: Kyle Aitken, Guy Gur-Ari
- Abstract summary: We consider an existing conjecture addressing the behavior of neural networks in the large width limit.
We prove the conjecture for deep networks with activation functions.
We point out a difference in the behavior of networks with analytic (and non-linear) activation functions and those with piecewise activations such as ReLULU.
- Score: 12.509746979383701
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We consider an existing conjecture addressing the asymptotic behavior of
neural networks in the large width limit. The results that follow from this
conjecture include tight bounds on the behavior of wide networks during
stochastic gradient descent, and a derivation of their finite-width dynamics.
We prove the conjecture for deep networks with polynomial activation functions,
greatly extending the validity of these results. Finally, we point out a
difference in the asymptotic behavior of networks with analytic (and
non-linear) activation functions and those with piecewise-linear activations
such as ReLU.
Related papers
- Activation thresholds and expressiveness of polynomial neural networks [0.0]
Polynomial neural networks have been implemented in a range of applications.
In this work, we introduce the notion of the activation threshold of a network architecture.
arXiv Detail & Related papers (2024-08-08T16:28:56Z) - Uniform Convergence of Deep Neural Networks with Lipschitz Continuous
Activation Functions and Variable Widths [3.0069322256338906]
We consider deep neural networks with a Lipschitz continuous activation function and with weight matrices of variable widths.
In particular, as convolutional neural networks are special deep neural networks with weight matrices of increasing widths, we put forward conditions on the mask sequence.
The Lipschitz continuity assumption on the activation functions allows us to include in our theory most of commonly used activation functions in applications.
arXiv Detail & Related papers (2023-06-02T17:07:12Z) - Data Topology-Dependent Upper Bounds of Neural Network Widths [52.58441144171022]
We first show that a three-layer neural network can be designed to approximate an indicator function over a compact set.
This is then extended to a simplicial complex, deriving width upper bounds based on its topological structure.
We prove the universal approximation property of three-layer ReLU networks using our topological approach.
arXiv Detail & Related papers (2023-05-25T14:17:15Z) - Globally Optimal Training of Neural Networks with Threshold Activation
Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations.
We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z) - Bayesian Interpolation with Deep Linear Networks [92.1721532941863]
Characterizing how neural network depth, width, and dataset size jointly impact model quality is a central problem in deep learning theory.
We show that linear networks make provably optimal predictions at infinite depth.
We also show that with data-agnostic priors, Bayesian model evidence in wide linear networks is maximized at infinite depth.
arXiv Detail & Related papers (2022-12-29T20:57:46Z) - Qualitative neural network approximation over R and C: Elementary proofs
for analytic and polynomial activation [0.0]
We prove approximations in classes of deep and shallow neural networks with analytic activation functions.
We show that fully connected and residual networks of large depth with activation functions can approximate any under certain width requirements.
arXiv Detail & Related papers (2022-03-25T01:36:13Z) - Decimation technique for open quantum systems: a case study with
driven-dissipative bosonic chains [62.997667081978825]
Unavoidable coupling of quantum systems to external degrees of freedom leads to dissipative (non-unitary) dynamics.
We introduce a method to deal with these systems based on the calculation of (dissipative) lattice Green's function.
We illustrate the power of this method with several examples of driven-dissipative bosonic chains of increasing complexity.
arXiv Detail & Related papers (2022-02-15T19:00:09Z) - Convex Analysis of the Mean Field Langevin Dynamics [49.66486092259375]
convergence rate analysis of the mean field Langevin dynamics is presented.
$p_q$ associated with the dynamics allows us to develop a convergence theory parallel to classical results in convex optimization.
arXiv Detail & Related papers (2022-01-25T17:13:56Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - The Representation Power of Neural Networks: Breaking the Curse of
Dimensionality [0.0]
We prove upper bounds on quantities for shallow and deep neural networks.
We further prove that these bounds nearly match the minimal number of parameters any continuous function approximator needs to approximate Korobov functions.
arXiv Detail & Related papers (2020-12-10T04:44:07Z) - Analytical aspects of non-differentiable neural networks [0.0]
We discuss the expressivity of quantized neural networks and approximation techniques for non-differentiable networks.
We show that QNNs have the same expressivity as DNNs in terms of approximation of Lipschitz functions in the $Linfty$ norm.
We also consider networks defined by means of Heaviside-type activation functions, and prove for them a pointwise approximation result by means of smooth networks.
arXiv Detail & Related papers (2020-11-03T17:20:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.