Optimal Approximation Rate of ReLU Networks in terms of Width and Depth
- URL: http://arxiv.org/abs/2103.00502v1
- Date: Sun, 28 Feb 2021 13:15:55 GMT
- Title: Optimal Approximation Rate of ReLU Networks in terms of Width and Depth
- Authors: Zuowei Shen, Haizhao Yang, Shijun Zhang
- Abstract summary: This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth.
It is proved that ReLU networks with width $mathcalObig(maxdlfloor N1/drfloor,, N+2big)$ and depth $mathcalO(L)$ can approximate a H"older continuous function on $[0,1]d$ with an approximation rate $mathcalObig(lambdasqrtd (N2L2ln
- Score: 5.37133760455631
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper concentrates on the approximation power of deep feed-forward
neural networks in terms of width and depth. It is proved by construction that
ReLU networks with width $\mathcal{O}\big(\max\{d\lfloor N^{1/d}\rfloor,\,
N+2\}\big)$ and depth $\mathcal{O}(L)$ can approximate a H\"older continuous
function on $[0,1]^d$ with an approximation rate
$\mathcal{O}\big(\lambda\sqrt{d} (N^2L^2\ln N)^{-\alpha/d}\big)$, where
$\alpha\in (0,1]$ and $\lambda>0$ are H\"older order and constant,
respectively. Such a rate is optimal up to a constant in terms of width and
depth separately, while existing results are only nearly optimal without the
logarithmic factor in the approximation rate. More generally, for an arbitrary
continuous function $f$ on $[0,1]^d$, the approximation rate becomes
$\mathcal{O}\big(\,\sqrt{d}\,\omega_f\big( (N^2L^2\ln N)^{-1/d}\big)\,\big)$,
where $\omega_f(\cdot)$ is the modulus of continuity. We also extend our
analysis to any continuous function $f$ on a bounded set. Particularly, if ReLU
networks with depth $31$ and width $\mathcal{O}(N)$ are used to approximate
one-dimensional Lipschitz continuous functions on $[0,1]$ with a Lipschitz
constant $\lambda>0$, the approximation rate in terms of the total number of
parameters, $W=\mathcal{O}(N^2)$, becomes $\mathcal{O}(\tfrac{\lambda}{W\ln
W})$, which has not been discovered in the literature for fixed-depth ReLU
networks.
Related papers
- The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$.
As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z) - On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks [2.4112990554464235]
We show that the rate $mathcalO((WL)-2s/d)$ indeed holds under the Sobolev embedding condition.
Key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth.
arXiv Detail & Related papers (2024-09-02T02:26:01Z) - Nearly Optimal Regret for Decentralized Online Convex Optimization [53.433398074919]
Decentralized online convex optimization (D-OCO) aims to minimize a sequence of global loss functions using only local computations and communications.
We develop novel D-OCO algorithms that can respectively reduce the regret bounds for convex and strongly convex functions.
Our algorithms are nearly optimal in terms of $T$, $n$, and $rho$.
arXiv Detail & Related papers (2024-02-14T13:44:16Z) - On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T.
Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$.
Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z) - The Approximate Degree of DNF and CNF Formulas [95.94432031144716]
For every $delta>0,$ we construct CNF and formulas of size with approximate degree $Omega(n1-delta),$ essentially matching the trivial upper bound of $n.
We show that for every $delta>0$, these models require $Omega(n1-delta)$, $Omega(n/4kk2)1-delta$, and $Omega(n/4kk2)1-delta$, respectively.
arXiv Detail & Related papers (2022-09-04T10:01:39Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse
of Dimensionality on H\"older Class [6.476766717110237]
We construct neural networks with ReLU, sine and $2x$ as activation functions.
In addition to its supper expressive power, functions implemented by ReLU-sine-$2x$ networks are (generalized) differentiable.
arXiv Detail & Related papers (2021-02-28T15:57:42Z) - Neural Network Approximation: Three Hidden Layers Are Enough [4.468952886990851]
A three-hidden-layer neural network with super approximation power is introduced.
Network is built with the floor function ($lfloor xrfloor$), the exponential function ($2x$), the step function ($1_xgeq 0$), or their compositions as the activation function in each neuron.
arXiv Detail & Related papers (2020-10-25T18:30:57Z) - Deep Network with Approximation Error Being Reciprocal of Width to Power
of Square Root of Depth [4.468952886990851]
A new network with super approximation power is introduced.
This network is built with Floor ($lfloor xrfloor$) or ReLU ($max0,x$) activation function in each neuron.
arXiv Detail & Related papers (2020-06-22T13:27:33Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z) - Deep Network Approximation for Smooth Functions [9.305095040004156]
We show that deep ReLU networks of width $mathcalO(Nln N)$ and depth $mathcalO(L L)$ can approximate $fin Cs([0,1]d)$ with a nearly optimal approximation error.
Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by $NinmathbbN+$ and $LinmathbbN+$, respectively.
arXiv Detail & Related papers (2020-01-09T15:06:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.