Deep Network Approximation for Smooth Functions
- URL: http://arxiv.org/abs/2001.03040v8
- Date: Fri, 24 Sep 2021 13:11:12 GMT
- Title: Deep Network Approximation for Smooth Functions
- Authors: Jianfeng Lu, Zuowei Shen, Haizhao Yang, Shijun Zhang
- Abstract summary: We show that deep ReLU networks of width $mathcalO(Nln N)$ and depth $mathcalO(L L)$ can approximate $fin Cs([0,1]d)$ with a nearly optimal approximation error.
Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by $NinmathbbN+$ and $LinmathbbN+$, respectively.
- Score: 9.305095040004156
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper establishes the (nearly) optimal approximation error
characterization of deep rectified linear unit (ReLU) networks for smooth
functions in terms of both width and depth simultaneously. To that end, we
first prove that multivariate polynomials can be approximated by deep ReLU
networks of width $\mathcal{O}(N)$ and depth $\mathcal{O}(L)$ with an
approximation error $\mathcal{O}(N^{-L})$. Through local Taylor expansions and
their deep ReLU network approximations, we show that deep ReLU networks of
width $\mathcal{O}(N\ln N)$ and depth $\mathcal{O}(L\ln L)$ can approximate
$f\in C^s([0,1]^d)$ with a nearly optimal approximation error
$\mathcal{O}(\|f\|_{C^s([0,1]^d)}N^{-2s/d}L^{-2s/d})$. Our estimate is
non-asymptotic in the sense that it is valid for arbitrary width and depth
specified by $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$, respectively.
Related papers
- Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks [2.4112990554464235]
We show that the rate $mathcalO((WL)-2s/d)$ indeed holds under the Sobolev embedding condition.
Key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth.
arXiv Detail & Related papers (2024-09-02T02:26:01Z) - Nearly Optimal Regret for Decentralized Online Convex Optimization [53.433398074919]
Decentralized online convex optimization (D-OCO) aims to minimize a sequence of global loss functions using only local computations and communications.
We develop novel D-OCO algorithms that can respectively reduce the regret bounds for convex and strongly convex functions.
Our algorithms are nearly optimal in terms of $T$, $n$, and $rho$.
arXiv Detail & Related papers (2024-02-14T13:44:16Z) - Optimal Approximation of Zonoids and Uniform Approximation by Shallow
Neural Networks [2.7195102129095003]
We study the following two related problems.
The first is to determine what error an arbitrary zonoid in $mathbbRd+1$ can be approximated in the Hausdorff distance by a sum of $n$ line segments.
The second is to determine optimal approximation rates in the uniform norm for shallow ReLU$k$ neural networks on their variation spaces.
arXiv Detail & Related papers (2023-07-28T03:43:17Z) - Most Neural Networks Are Almost Learnable [52.40331776572531]
We show that for any fixed $epsilon>0$ and depth $i$, there is a poly-time algorithm that learns random Xavier networks of depth $i$.
The algorithm runs in time and sample complexity of $(bard)mathrmpoly(epsilon-1)$, where $bar d$ is the size of the network.
For some cases of sigmoid and ReLU-like activations the bound can be improved to $(bard)mathrmpolylog(eps
arXiv Detail & Related papers (2023-05-25T22:27:42Z) - Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Optimal Approximation Rate of ReLU Networks in terms of Width and Depth [5.37133760455631]
This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth.
It is proved that ReLU networks with width $mathcalObig(maxdlfloor N1/drfloor,, N+2big)$ and depth $mathcalO(L)$ can approximate a H"older continuous function on $[0,1]d$ with an approximation rate $mathcalObig(lambdasqrtd (N2L2ln
arXiv Detail & Related papers (2021-02-28T13:15:55Z) - Small Covers for Near-Zero Sets of Polynomials and Learning Latent
Variable Models [56.98280399449707]
We show that there exists an $epsilon$-cover for $S$ of cardinality $M = (k/epsilon)O_d(k1/d)$.
Building on our structural result, we obtain significantly improved learning algorithms for several fundamental high-dimensional probabilistic models hidden variables.
arXiv Detail & Related papers (2020-12-14T18:14:08Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - Deep Network with Approximation Error Being Reciprocal of Width to Power
of Square Root of Depth [4.468952886990851]
A new network with super approximation power is introduced.
This network is built with Floor ($lfloor xrfloor$) or ReLU ($max0,x$) activation function in each neuron.
arXiv Detail & Related papers (2020-06-22T13:27:33Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.