Deep Network with Approximation Error Being Reciprocal of Width to Power
of Square Root of Depth
- URL: http://arxiv.org/abs/2006.12231v6
- Date: Tue, 27 Oct 2020 01:51:52 GMT
- Title: Deep Network with Approximation Error Being Reciprocal of Width to Power
of Square Root of Depth
- Authors: Zuowei Shen and Haizhao Yang and Shijun Zhang
- Abstract summary: A new network with super approximation power is introduced.
This network is built with Floor ($lfloor xrfloor$) or ReLU ($max0,x$) activation function in each neuron.
- Score: 4.468952886990851
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: A new network with super approximation power is introduced. This network is
built with Floor ($\lfloor x\rfloor$) or ReLU ($\max\{0,x\}$) activation
function in each neuron and hence we call such networks Floor-ReLU networks.
For any hyper-parameters $N\in\mathbb{N}^+$ and $L\in\mathbb{N}^+$, it is shown
that Floor-ReLU networks with width $\max\{d,\, 5N+13\}$ and depth $64dL+3$ can
uniformly approximate a H\"older function $f$ on $[0,1]^d$ with an
approximation error $3\lambda d^{\alpha/2}N^{-\alpha\sqrt{L}}$, where $\alpha
\in(0,1]$ and $\lambda$ are the H\"older order and constant, respectively. More
generally for an arbitrary continuous function $f$ on $[0,1]^d$ with a modulus
of continuity $\omega_f(\cdot)$, the constructive approximation rate is
$\omega_f(\sqrt{d}\,N^{-\sqrt{L}})+2\omega_f(\sqrt{d}){N^{-\sqrt{L}}}$. As a
consequence, this new class of networks overcomes the curse of dimensionality
in approximation power when the variation of $\omega_f(r)$ as $r\to 0$ is
moderate (e.g., $\omega_f(r) \lesssim r^\alpha$ for H\"older continuous
functions), since the major term to be considered in our approximation rate is
essentially $\sqrt{d}$ times a function of $N$ and $L$ independent of $d$
within the modulus of continuity.
Related papers
- The Communication Complexity of Approximating Matrix Rank [50.6867896228563]
We show that this problem has randomized communication complexity $Omega(frac1kcdot n2log|mathbbF|)$.
As an application, we obtain an $Omega(frac1kcdot n2log|mathbbF|)$ space lower bound for any streaming algorithm with $k$ passes.
arXiv Detail & Related papers (2024-10-26T06:21:42Z) - $\ell_p$-Regression in the Arbitrary Partition Model of Communication [59.89387020011663]
We consider the randomized communication complexity of the distributed $ell_p$-regression problem in the coordinator model.
For $p = 2$, i.e., least squares regression, we give the first optimal bound of $tildeTheta(sd2 + sd/epsilon)$ bits.
For $p in (1,2)$,we obtain an $tildeO(sd2/epsilon + sd/mathrmpoly(epsilon)$ upper bound.
arXiv Detail & Related papers (2023-07-11T08:51:53Z) - The Approximate Degree of DNF and CNF Formulas [95.94432031144716]
For every $delta>0,$ we construct CNF and formulas of size with approximate degree $Omega(n1-delta),$ essentially matching the trivial upper bound of $n.
We show that for every $delta>0$, these models require $Omega(n1-delta)$, $Omega(n/4kk2)1-delta$, and $Omega(n/4kk2)1-delta$, respectively.
arXiv Detail & Related papers (2022-09-04T10:01:39Z) - Expressive power of binary and ternary neural networks [91.3755431537592]
We show that deep sparse ReLU networks with ternary weights and deep ReLU networks with binary weights can approximate $beta$-H"older functions on $[0,1]d$.
arXiv Detail & Related papers (2022-06-27T13:16:08Z) - ReLU Network Approximation in Terms of Intrinsic Parameters [5.37133760455631]
We study the approximation error of ReLU networks in terms of the number of intrinsic parameters.
We design a ReLU network with only three intrinsic parameters to approximate H"older continuous functions with an arbitrarily small error.
arXiv Detail & Related papers (2021-11-15T18:20:38Z) - Neural networks with superexpressive activations and integer weights [91.3755431537592]
An example of an activation function $sigma$ is given such that networks with activations $sigma, lfloorcdotrfloor$, integer weights and a fixed architecture is given.
The range of integer weights required for $varepsilon$-approximation of H"older continuous functions is derived.
arXiv Detail & Related papers (2021-05-20T17:29:08Z) - Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse
of Dimensionality on H\"older Class [6.476766717110237]
We construct neural networks with ReLU, sine and $2x$ as activation functions.
In addition to its supper expressive power, functions implemented by ReLU-sine-$2x$ networks are (generalized) differentiable.
arXiv Detail & Related papers (2021-02-28T15:57:42Z) - Optimal Approximation Rate of ReLU Networks in terms of Width and Depth [5.37133760455631]
This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth.
It is proved that ReLU networks with width $mathcalObig(maxdlfloor N1/drfloor,, N+2big)$ and depth $mathcalO(L)$ can approximate a H"older continuous function on $[0,1]d$ with an approximation rate $mathcalObig(lambdasqrtd (N2L2ln
arXiv Detail & Related papers (2021-02-28T13:15:55Z) - Neural Network Approximation: Three Hidden Layers Are Enough [4.468952886990851]
A three-hidden-layer neural network with super approximation power is introduced.
Network is built with the floor function ($lfloor xrfloor$), the exponential function ($2x$), the step function ($1_xgeq 0$), or their compositions as the activation function in each neuron.
arXiv Detail & Related papers (2020-10-25T18:30:57Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z) - Deep Network Approximation for Smooth Functions [9.305095040004156]
We show that deep ReLU networks of width $mathcalO(Nln N)$ and depth $mathcalO(L L)$ can approximate $fin Cs([0,1]d)$ with a nearly optimal approximation error.
Our estimate is non-asymptotic in the sense that it is valid for arbitrary width and depth specified by $NinmathbbN+$ and $LinmathbbN+$, respectively.
arXiv Detail & Related papers (2020-01-09T15:06:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.