Related papers: Sharp Lower Bounds on the Approximation Rate of Shallow Neural Networks

Sharp Lower Bounds on the Approximation Rate of Shallow Neural Networks

URL: http://arxiv.org/abs/2106.14997v1
Date: Mon, 28 Jun 2021 22:01:42 GMT
Title: Sharp Lower Bounds on the Approximation Rate of Shallow Neural Networks
Authors: Jonathan W. Siegel, Jinchao Xu
Abstract summary: We prove sharp lower bounds on the approximation rates for shallow neural networks. These lower bounds apply to both sigmoidal activation functions with bounded variation and to activation functions which are a power of the ReLU.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We consider the approximation rates of shallow neural networks with respect to the variation norm. Upper bounds on these rates have been established for sigmoidal and ReLU activation functions, but it has remained an important open problem whether these rates are sharp. In this article, we provide a solution to this problem by proving sharp lower bounds on the approximation rates for shallow neural networks, which are obtained by lower bounding the $L^2$-metric entropy of the convex hull of the neural network basis functions. In addition, our methods also give sharp lower bounds on the Kolmogorov $n$-widths of this convex hull, which show that the variation spaces corresponding to shallow neural networks cannot be efficiently approximated by linear methods. These lower bounds apply to both sigmoidal activation functions with bounded variation and to activation functions which are a power of the ReLU. Our results also quantify how much stronger the Barron spectral norm is than the variation norm and, combined with previous results, give the asymptotics of the $L^\infty$-metric entropy up to logarithmic factors in the case of the ReLU activation function.

Related papers

Approximation properties of neural ODEs [3.6779367026247627]
We study the approximation properties of shallow neural networks whose activation function is defined as the flow of a neural ordinary differential equation (neural ODE) We prove the universal approximation property (UAP) of such shallow neural networks in the space of continuous functions.
arXiv Detail & Related papers (2025-03-19T21:11:28Z)
Approximation Error and Complexity Bounds for ReLU Networks on Low-Regular Function Spaces [0.0]
We consider the approximation of a large class of bounded functions, with minimal regularity assumptions, by ReLU neural networks. We show that the approximation error can be bounded from above by a quantity proportional to the uniform norm of the target function.
arXiv Detail & Related papers (2024-05-10T14:31:58Z)
A Mean-Field Analysis of Neural Stochastic Gradient Descent-Ascent for Functional Minimax Optimization [90.87444114491116]
This paper studies minimax optimization problems defined over infinite-dimensional function classes of overparametricized two-layer neural networks. We address (i) the convergence of the gradient descent-ascent algorithm and (ii) the representation learning of the neural networks. Results show that the feature representation induced by the neural networks is allowed to deviate from the initial one by the magnitude of $O(alpha-1)$, measured in terms of the Wasserstein distance.
arXiv Detail & Related papers (2024-04-18T16:46:08Z)
Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training. We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z)
Promises and Pitfalls of the Linearized Laplace in Bayesian Optimization [73.80101701431103]
The linearized-Laplace approximation (LLA) has been shown to be effective and efficient in constructing Bayesian neural networks. We study the usefulness of the LLA in Bayesian optimization and highlight its strong performance and flexibility.
arXiv Detail & Related papers (2023-04-17T14:23:43Z)
Globally Optimal Training of Neural Networks with Threshold Activation Functions [63.03759813952481]
We study weight decay regularized training problems of deep neural networks with threshold activations. We derive a simplified convex optimization formulation when the dataset can be shattered at a certain layer of the network.
arXiv Detail & Related papers (2023-03-06T18:59:13Z)
How do noise tails impact on deep ReLU networks? [2.5889847253961418]
We show how the optimal rate of convergence depends on p, the degree of smoothness and the intrinsic dimension in a class of nonparametric regression functions. We also contribute some new results on the approximation theory of deep ReLU neural networks.
arXiv Detail & Related papers (2022-03-20T00:27:32Z)
Approximation bounds for norm constrained neural networks with applications to regression and GANs [9.645327615996914]
We prove upper and lower bounds on the approximation error of ReLU neural networks with norm constraint on the weights. We apply these approximation bounds to analyze the convergences of regression using norm constrained neural networks and distribution estimation by GANs.
arXiv Detail & Related papers (2022-01-24T02:19:05Z)
Near-Minimax Optimal Estimation With Shallow ReLU Neural Networks [19.216784367141972]
We study the problem of estimating an unknown function from noisy data using shallow (single-hidden layer) ReLU neural networks. We quantify the performance of these neural network estimators when the data-generating function belongs to the space of functions of second-order bounded variation in the Radon domain.
arXiv Detail & Related papers (2021-09-18T05:56:06Z)
A Dynamical Central Limit Theorem for Shallow Neural Networks [48.66103132697071]
We prove that the fluctuations around the mean limit remain bounded in mean square throughout training. If the mean-field dynamics converges to a measure that interpolates the training data, we prove that the deviation eventually vanishes in the CLT scaling.
arXiv Detail & Related papers (2020-08-21T18:00:50Z)
Measuring Model Complexity of Neural Networks with Curve Activation Functions [100.98319505253797]
We propose the linear approximation neural network (LANN) to approximate a given deep model with curve activation function. We experimentally explore the training process of neural networks and detect overfitting. We find that the $L1$ and $L2$ regularizations suppress the increase of model complexity.
arXiv Detail & Related papers (2020-06-16T07:38:06Z)

This list is automatically generated from the titles and abstracts of the papers in this site.