Related papers: Rates of Approximation by ReLU Shallow Neural Networks

Rates of Approximation by ReLU Shallow Neural Networks

URL: http://arxiv.org/abs/2307.12461v1
Date: Mon, 24 Jul 2023 00:16:50 GMT
Title: Rates of Approximation by ReLU Shallow Neural Networks
Authors: Tong Mao and Ding-Xuan Zhou
Abstract summary: We show that ReLU shallow neural networks with $m$ hidden neurons can uniformly approximate functions from the H"older space. Such rates are very close to the optimal one $O(m-fracrd)$ in the sense that $fracd+2d+4d+4$ is close to $1$, when the dimension $d$ is large.
Score: 8.22379888383833
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Neural networks activated by the rectified linear unit (ReLU) play a central role in the recent development of deep learning. The topic of approximating functions from H\"older spaces by these networks is crucial for understanding the efficiency of the induced learning algorithms. Although the topic has been well investigated in the setting of deep neural networks with many layers of hidden neurons, it is still open for shallow networks having only one hidden layer. In this paper, we provide rates of uniform approximation by these networks. We show that ReLU shallow neural networks with $m$ hidden neurons can uniformly approximate functions from the H\"older space $W_\infty^r([-1, 1]^d)$ with rates $O((\log m)^{\frac{1}{2} +d}m^{-\frac{r}{d}\frac{d+2}{d+4}})$ when $r<d/2 +2$. Such rates are very close to the optimal one $O(m^{-\frac{r}{d}})$ in the sense that $\frac{d+2}{d+4}$ is close to $1$, when the dimension $d$ is large.

Related papers

Neural Networks Learn Generic Multi-Index Models Near Information-Theoretic Limit [66.20349460098275]
We study the gradient descent learning of a general Gaussian Multi-index model $f(boldsymbolx)=g(boldsymbolUboldsymbolx)$ with hidden subspace $boldsymbolUin mathbbRrtimes d$.<n>We prove that under generic non-degenerate assumptions on the link function, a standard two-layer neural network trained via layer-wise gradient descent can agnostically learn the target with $o_d(1)$ test error.
arXiv Detail & Related papers (2025-11-19T04:46:47Z)
Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements. We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z)
On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks [2.4112990554464235]
We show that the rate $mathcalO((WL)-2s/d)$ indeed holds under the Sobolev embedding condition. Key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth.
arXiv Detail & Related papers (2024-09-02T02:26:01Z)
Bayesian Inference with Deep Weakly Nonlinear Networks [57.95116787699412]
We show at a physics level of rigor that Bayesian inference with a fully connected neural network is solvable. We provide techniques to compute the model evidence and posterior to arbitrary order in $1/N$ and at arbitrary temperature.
arXiv Detail & Related papers (2024-05-26T17:08:04Z)
On Expressivity of Height in Neural Networks [29.49793694185358]
We call a neural network characterized by width, depth, and height a 3D network. We show via bound estimation and explicit construction that given the same number of neurons and parameters, a 3D ReLU network of width $W$, depth $K$, and height $H$ has greater expressive power than a 2D network of width $Htimes W$ and depth $K$.
arXiv Detail & Related papers (2023-05-11T11:54:36Z)
Generalization Ability of Wide Neural Networks on $\mathbb{R}$ [8.508360765158326]
We study the generalization ability of the wide two-layer ReLU neural network on $mathbbR$. We show that: $i)$ when the width $mrightarrowinfty$, the neural network kernel (NNK) uniformly converges to the NTK; $ii)$ the minimax rate of regression over the RKHS associated to $K_1$ is $n-2/3$; $iii)$ if one adopts the early stopping strategy in training a wide neural network, the resulting neural network achieves the minimax rate; $iv
arXiv Detail & Related papers (2023-02-12T15:07:27Z)
Achieve the Minimum Width of Neural Networks for Universal Approximation [1.52292571922932]
We study the exact minimum width, $w_min$, for the universal approximation property (UAP) of neural networks. In particular, the critical width, $w*_min$, for $Lp$-UAP can be achieved by leaky-ReLU networks.
arXiv Detail & Related papers (2022-09-23T04:03:50Z)
Understanding Deep Neural Function Approximation in Reinforcement Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL) We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces. Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z)
Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z)
Expressive power of binary and ternary neural networks [91.3755431537592]
We show that deep sparse ReLU networks with ternary weights and deep ReLU networks with binary weights can approximate $beta$-H"older functions on $[0,1]d$.
arXiv Detail & Related papers (2022-06-27T13:16:08Z)
An Exponential Improvement on the Memorization Capacity of Deep Threshold Networks [40.489350374378645]
We prove that $widetildemathcalO(e1/delta2+sqrtn)$ neurons and $widetildemathcalO(fracddelta+n)$ weights are sufficient. We also prove new lower bounds by connecting in neural networks to the purely geometric problem of separating $n$ points on a sphere using hyperplanes.
arXiv Detail & Related papers (2021-06-14T19:42:32Z)
Function approximation by deep neural networks with parameters $\{0,\pm \frac{1}{2}, \pm 1, 2\}$ [91.3755431537592]
It is shown that $C_beta$-smooth functions can be approximated by neural networks with parameters $0,pm frac12, pm 1, 2$. The depth, width and the number of active parameters of constructed networks have, up to a logarithimc factor, the same dependence on the approximation error as the networks with parameters in $[-1,1]$.
arXiv Detail & Related papers (2021-03-15T19:10:02Z)
Approximating smooth functions by deep neural networks with sigmoid activation function [0.0]
We study the power of deep neural networks (DNNs) with sigmoid activation function. We show that DNNs with fixed depth and a width of order $Md$ achieve an approximation rate of $M-2p$.
arXiv Detail & Related papers (2020-10-08T07:29:31Z)
Learning Over-Parametrized Two-Layer ReLU Neural Networks beyond NTK [58.5766737343951]
We consider the dynamic of descent for learning a two-layer neural network. We show that an over-parametrized two-layer neural network can provably learn with gradient loss at most ground with Tangent samples.
arXiv Detail & Related papers (2020-07-09T07:09:28Z)
Towards Understanding Hierarchical Learning: Benefits of Neural Representations [160.33479656108926]
In this work, we demonstrate that intermediate neural representations add more flexibility to neural networks. We show that neural representation can achieve improved sample complexities compared with the raw input. Our results characterize when neural representations are beneficial, and may provide a new perspective on why depth is important in deep learning.
arXiv Detail & Related papers (2020-06-24T02:44:54Z)
Network size and weights size for memorization with two-layers neural networks [15.333300054767726]
We propose a new training procedure for ReLU networks, based on complex (as opposed to real) recombination of the neurons. We show approximate memorization with both $Oleft(fracnd cdot fraclog(1/epsilon)epsilonright)$ neurons, as well as nearly-optimal size of the weights.
arXiv Detail & Related papers (2020-06-04T13:44:57Z)

This list is automatically generated from the titles and abstracts of the papers in this site.