Related papers: Optimal Approximation Rate of ReLU Networks in terms of Width and Depth

Optimal Approximation Rate of ReLU Networks in terms of Width and Depth

URL: http://arxiv.org/abs/2103.00502v1
Date: Sun, 28 Feb 2021 13:15:55 GMT
Title: Optimal Approximation Rate of ReLU Networks in terms of Width and Depth
Authors: Zuowei Shen, Haizhao Yang, Shijun Zhang
Abstract summary: This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved that ReLU networks with width $mathcalObig(maxdlfloor N1/drfloor,, N+2big)$ and depth $mathcalO(L)$ can approximate a H"older continuous function on $[0,1]d$ with an approximation rate $mathcalObig(lambdasqrtd (N2L2ln
Score: 5.37133760455631
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved by construction that ReLU networks with width $\mathcal{O}\big(\max\{d\lfloor N^{1/d}\rfloor,\, N+2\}\big)$ and depth $\mathcal{O}(L)$ can approximate a H\"older continuous function on $[0,1]^d$ with an approximation rate $\mathcal{O}\big(\lambda\sqrt{d} (N^2L^2\ln N)^{-\alpha/d}\big)$, where $\alpha\in (0,1]$ and $\lambda>0$ are H\"older order and constant, respectively. Such a rate is optimal up to a constant in terms of width and depth separately, while existing results are only nearly optimal without the logarithmic factor in the approximation rate. More generally, for an arbitrary continuous function $f$ on $[0,1]^d$, the approximation rate becomes $\mathcal{O}\big(\,\sqrt{d}\,\omega_f\big( (N^2L^2\ln N)^{-1/d}\big)\,\big)$, where $\omega_f(\cdot)$ is the modulus of continuity. We also extend our analysis to any continuous function $f$ on a bounded set. Particularly, if ReLU networks with depth $31$ and width $\mathcal{O}(N)$ are used to approximate one-dimensional Lipschitz continuous functions on $[0,1]$ with a Lipschitz constant $\lambda>0$, the approximation rate in terms of the total number of parameters, $W=\mathcal{O}(N^2)$, becomes $\mathcal{O}(\tfrac{\lambda}{W\ln W})$, which has not been discovered in the literature for fixed-depth ReLU networks.

Related papers

Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs [54.28273395444243]
We show that the Monotonic Value Omega (MVP) algorithm achieves a variance-aware gap-dependent regret bound of $$tildeOleft(left(sum_Delta_h(s,a)>0 fracH2 log K land mathttVar_maxtextc$.
arXiv Detail & Related papers (2025-06-06T20:33:57Z)
The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$. As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z)
On the optimal approximation of Sobolev and Besov functions using deep ReLU neural networks [2.4112990554464235]
We show that the rate $mathcalO((WL)-2s/d)$ indeed holds under the Sobolev embedding condition. Key tool in our proof is a novel encoding of sparse vectors by using deep ReLU neural networks with varied width and depth.
arXiv Detail & Related papers (2024-09-02T02:26:01Z)
Nearly Optimal Regret for Decentralized Online Convex Optimization [53.433398074919]
Decentralized online convex optimization (D-OCO) aims to minimize a sequence of global loss functions using only local computations and communications. We develop novel D-OCO algorithms that can respectively reduce the regret bounds for convex and strongly convex functions. Our algorithms are nearly optimal in terms of $T$, $n$, and $rho$.
arXiv Detail & Related papers (2024-02-14T13:44:16Z)
On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T. Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$. Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z)
Fast $(1+\varepsilon)$-Approximation Algorithms for Binary Matrix Factorization [54.29685789885059]
We introduce efficient $(1+varepsilon)$-approximation algorithms for the binary matrix factorization (BMF) problem. The goal is to approximate $mathbfA$ as a product of low-rank factors. Our techniques generalize to other common variants of the BMF problem.
arXiv Detail & Related papers (2023-06-02T18:55:27Z)
The Approximate Degree of DNF and CNF Formulas [95.94432031144716]
For every $delta>0,$ we construct CNF and formulas of size with approximate degree $Omega(n1-delta),$ essentially matching the trivial upper bound of $n. We show that for every $delta>0$, these models require $Omega(n1-delta)$, $Omega(n/4kk2)1-delta$, and $Omega(n/4kk2)1-delta$, respectively.
arXiv Detail & Related papers (2022-09-04T10:01:39Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
Neural networks with superexpressive activations and integer weights [91.3755431537592]
An example of an activation function $sigma$ is given such that networks with activations $sigma, lfloorcdotrfloor$, integer weights and a fixed architecture is given. The range of integer weights required for $varepsilon$-approximation of H"older continuous functions is derived.
arXiv Detail & Related papers (2021-05-20T17:29:08Z)
Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality on H\"older Class [6.476766717110237]
We construct neural networks with ReLU, sine and $2x$ as activation functions. In addition to its supper expressive power, functions implemented by ReLU-sine-$2x$ networks are (generalized) differentiable.
arXiv Detail & Related papers (2021-02-28T15:57:42Z)
Optimal Mean Estimation without a Variance [103.26777953032537]
We study the problem of heavy-tailed mean estimation in settings where the variance of the data-generating distribution does not exist. We design an estimator which attains the smallest possible confidence interval as a function of $n,d,delta$.
arXiv Detail & Related papers (2020-11-24T22:39:21Z)
Neural Network Approximation: Three Hidden Layers Are Enough [4.468952886990851]
A three-hidden-layer neural network with super approximation power is introduced. Network is built with the floor function ($lfloor xrfloor$), the exponential function ($2x$), the step function ($1_xgeq 0$), or their compositions as the activation function in each neuron.
arXiv Detail & Related papers (2020-10-25T18:30:57Z)
Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth [4.468952886990851]
A new network with super approximation power is introduced. This network is built with Floor ($lfloor xrfloor$) or ReLU ($max0,x$) activation function in each neuron.
arXiv Detail & Related papers (2020-06-22T13:27:33Z)
On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model. Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
Deep Network Approximation for Smooth Functions [9.305095040004156]
We show that deep ReLU networks of width $mathcalO(Nln N)$ and depth $mathcalO(L L)$ can approximate $fin Cs([0,1]d)$ with a nearly optimal approximation error. Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by $NinmathbbN+$ and $LinmathbbN+$, respectively.
arXiv Detail & Related papers (2020-01-09T15:06:10Z)

This list is automatically generated from the titles and abstracts of the papers in this site.