Optimal Approximation Rates and Metric Entropy of ReLU$^k$ and Cosine
Networks
- URL: http://arxiv.org/abs/2101.12365v1
- Date: Fri, 29 Jan 2021 02:29:48 GMT
- Title: Optimal Approximation Rates and Metric Entropy of ReLU$^k$ and Cosine
Networks
- Authors: Jonathan W. Siegel, Jinchao Xu
- Abstract summary: We show that the largest Banach space of functions which can be efficiently approximated by the corresponding shallow neural networks is the space whose norm is given by the gauge of the closed convex hull of the set $pmsigma(omegacdot x + b)$.
We establish the precises of the $L2$-metric entropy of the unit ball of these guage spaces and, as a consequence, the optimal approximation rates for shallow ReLU$k$ networks.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This article addresses several fundamental issues associated with the
approximation theory of neural networks, including the characterization of
approximation spaces, the determination of the metric entropy of these spaces,
and approximation rates of neural networks. For any activation function
$\sigma$, we show that the largest Banach space of functions which can be
efficiently approximated by the corresponding shallow neural networks is the
space whose norm is given by the gauge of the closed convex hull of the set
$\{\pm\sigma(\omega\cdot x + b)\}$. We characterize this space for the ReLU$^k$
and cosine activation functions and, in particular, show that the resulting
gauge space is equivalent to the spectral Barron space if $\sigma=\cos$ and is
equivalent to the Barron space when $\sigma={\rm ReLU}$. Our main result
establishes the precise asymptotics of the $L^2$-metric entropy of the unit
ball of these guage spaces and, as a consequence, the optimal approximation
rates for shallow ReLU$^k$ networks. The sharpest previous results hold only in
the special case that $k=0$ and $d=2$, where the metric entropy has been
determined up to logarithmic factors. When $k > 0$ or $d > 2$, there is a
significant gap between the previous best upper and lower bounds. We close all
of these gaps and determine the precise asymptotics of the metric entropy for
all $k \geq 0$ and $d\geq 2$, including removing the logarithmic factors
previously mentioned. Finally, we use these results to quantify how much is
lost by Barron's spectral condition relative to the convex hull of
$\{\pm\sigma(\omega\cdot x + b)\}$ when $\sigma={\rm ReLU}^k$.
Related papers
- Approximation Rates for Shallow ReLU$^k$ Neural Networks on Sobolev Spaces via the Radon Transform [4.096453902709292]
We consider the problem of how efficiently shallow neural networks with the ReLU$k$ activation function can approximate functions from Sobolev spaces.
We provide a simple proof of nearly optimal approximation rates in a variety of cases, including when $qleq p$, $pgeq 2$, and $s leq k + (d+1)/2$.
arXiv Detail & Related papers (2024-08-20T16:43:45Z) - Learning with Norm Constrained, Over-parameterized, Two-layer Neural Networks [54.177130905659155]
Recent studies show that a reproducing kernel Hilbert space (RKHS) is not a suitable space to model functions by neural networks.
In this paper, we study a suitable function space for over- parameterized two-layer neural networks with bounded norms.
arXiv Detail & Related papers (2024-04-29T15:04:07Z) - A Unified Framework for Uniform Signal Recovery in Nonlinear Generative
Compressed Sensing [68.80803866919123]
Under nonlinear measurements, most prior results are non-uniform, i.e., they hold with high probability for a fixed $mathbfx*$ rather than for all $mathbfx*$ simultaneously.
Our framework accommodates GCS with 1-bit/uniformly quantized observations and single index models as canonical examples.
We also develop a concentration inequality that produces tighter bounds for product processes whose index sets have low metric entropy.
arXiv Detail & Related papers (2023-09-25T17:54:19Z) - Optimal Approximation of Zonoids and Uniform Approximation by Shallow
Neural Networks [2.7195102129095003]
We study the following two related problems.
The first is to determine what error an arbitrary zonoid in $mathbbRd+1$ can be approximated in the Hausdorff distance by a sum of $n$ line segments.
The second is to determine optimal approximation rates in the uniform norm for shallow ReLU$k$ neural networks on their variation spaces.
arXiv Detail & Related papers (2023-07-28T03:43:17Z) - Generalization and Stability of Interpolating Neural Networks with
Minimal Width [37.908159361149835]
We investigate the generalization and optimization of shallow neural-networks trained by gradient in the interpolating regime.
We prove the training loss number minimizations $m=Omega(log4 (n))$ neurons and neurons $Tapprox n$.
With $m=Omega(log4 (n))$ neurons and $Tapprox n$, we bound the test loss training by $tildeO (1/)$.
arXiv Detail & Related papers (2023-02-18T05:06:15Z) - Optimal Approximation Rates for Deep ReLU Neural Networks on Sobolev and Besov Spaces [2.7195102129095003]
Deep neural networks with the ReLU activation function can approximate functions in the Sobolev spaces $Ws(L_q(Omega))$ and Besov spaces $Bs_r(L_q(Omega))$.
This problem is important when studying the application of neural networks in a variety of fields.
arXiv Detail & Related papers (2022-11-25T23:32:26Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - A Law of Robustness beyond Isoperimetry [84.33752026418045]
We prove a Lipschitzness lower bound $Omega(sqrtn/p)$ of robustness of interpolating neural network parameters on arbitrary distributions.
We then show the potential benefit of overparametrization for smooth data when $n=mathrmpoly(d)$.
We disprove the potential existence of an $O(1)$-Lipschitz robust interpolating function when $n=exp(omega(d))$.
arXiv Detail & Related papers (2022-02-23T16:10:23Z) - On quantitative Laplace-type convergence results for some exponential
probability measures, with two applications [2.9189409618561966]
We find a limit of the sequence of measures $(pi_varepsilon)_varepsilon >0$ with density w.r.t the Lebesgue measure $(mathrmd pi_varepsilon)_varepsilon >0$ with density w.r.t the Lebesgue measure $(mathrmd pi_varepsilon)_varepsilon >0$ with density w.r.t the Lebesgue measure $(mathrmd
arXiv Detail & Related papers (2021-10-25T13:00:25Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Spectral density estimation with the Gaussian Integral Transform [91.3755431537592]
spectral density operator $hatrho(omega)=delta(omega-hatH)$ plays a central role in linear response theory.
We describe a near optimal quantum algorithm providing an approximation to the spectral density.
arXiv Detail & Related papers (2020-04-10T03:14:38Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.