Related papers: Neural Network Approximation: Three Hidden Layers Are Enough

Neural Network Approximation: Three Hidden Layers Are Enough

URL: http://arxiv.org/abs/2010.14075v4
Date: Mon, 19 Apr 2021 16:53:46 GMT
Title: Neural Network Approximation: Three Hidden Layers Are Enough
Authors: Zuowei Shen and Haizhao Yang and Shijun Zhang
Abstract summary: A three-hidden-layer neural network with super approximation power is introduced. Network is built with the floor function ($lfloor xrfloor$), the exponential function ($2x$), the step function ($1_xgeq 0$), or their compositions as the activation function in each neuron.
Score: 4.468952886990851
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: A three-hidden-layer neural network with super approximation power is introduced. This network is built with the floor function ($\lfloor x\rfloor$), the exponential function ($2^x$), the step function ($1_{x\geq 0}$), or their compositions as the activation function in each neuron and hence we call such networks as Floor-Exponential-Step (FLES) networks. For any width hyper-parameter $N\in\mathbb{N}^+$, it is shown that FLES networks with width $\max\{d,N\}$ and three hidden layers can uniformly approximate a H\"older continuous function $f$ on $[0,1]^d$ with an exponential approximation rate $3\lambda (2\sqrt{d})^{\alpha} 2^{-\alpha N}$, where $\alpha \in(0,1]$ and $\lambda>0$ are the H\"older order and constant, respectively. More generally for an arbitrary continuous function $f$ on $[0,1]^d$ with a modulus of continuity $\omega_f(\cdot)$, the constructive approximation rate is $2\omega_f(2\sqrt{d}){2^{-N}}+\omega_f(2\sqrt{d}\,2^{-N})$. Moreover, we extend such a result to general bounded continuous functions on a bounded set $E\subseteq\mathbb{R}^d$. As a consequence, this new class of networks overcomes the curse of dimensionality in approximation power when the variation of $\omega_f(r)$ as $r\rightarrow 0$ is moderate (e.g., $\omega_f(r)\lesssim r^\alpha$ for H\"older continuous functions), since the major term to be concerned in our approximation rate is essentially $\sqrt{d}$ times a function of $N$ independent of $d$ within the modulus of continuity. Finally, we extend our analysis to derive similar approximation results in the $L^p$-norm for $p\in[1,\infty)$ via replacing Floor-Exponential-Step activation functions by continuous activation functions.

Related papers

On the $O(\frac{\sqrt{d}}{T^{1/4}})$ Convergence Rate of RMSProp and Its Momentum Extension Measured by $\ell_1$ Norm [59.65871549878937]
This paper considers the RMSProp and its momentum extension and establishes the convergence rate of $frac1Tsum_k=1T. Our convergence rate matches the lower bound with respect to all the coefficients except the dimension $d$. Our convergence rate can be considered to be analogous to the $frac1Tsum_k=1T.
arXiv Detail & Related papers (2024-02-01T07:21:32Z)
The Approximate Degree of DNF and CNF Formulas [95.94432031144716]
For every $delta>0,$ we construct CNF and formulas of size with approximate degree $Omega(n1-delta),$ essentially matching the trivial upper bound of $n. We show that for every $delta>0$, these models require $Omega(n1-delta)$, $Omega(n/4kk2)1-delta$, and $Omega(n/4kk2)1-delta$, respectively.
arXiv Detail & Related papers (2022-09-04T10:01:39Z)
Expressive power of binary and ternary neural networks [91.3755431537592]
We show that deep sparse ReLU networks with ternary weights and deep ReLU networks with binary weights can approximate $beta$-H"older functions on $[0,1]d$.
arXiv Detail & Related papers (2022-06-27T13:16:08Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
ReLU Network Approximation in Terms of Intrinsic Parameters [5.37133760455631]
We study the approximation error of ReLU networks in terms of the number of intrinsic parameters. We design a ReLU network with only three intrinsic parameters to approximate H"older continuous functions with an arbitrarily small error.
arXiv Detail & Related papers (2021-11-15T18:20:38Z)
Neural networks with superexpressive activations and integer weights [91.3755431537592]
An example of an activation function $sigma$ is given such that networks with activations $sigma, lfloorcdotrfloor$, integer weights and a fixed architecture is given. The range of integer weights required for $varepsilon$-approximation of H"older continuous functions is derived.
arXiv Detail & Related papers (2021-05-20T17:29:08Z)
Deep Neural Networks with ReLU-Sine-Exponential Activations Break Curse of Dimensionality on H\"older Class [6.476766717110237]
We construct neural networks with ReLU, sine and $2x$ as activation functions. In addition to its supper expressive power, functions implemented by ReLU-sine-$2x$ networks are (generalized) differentiable.
arXiv Detail & Related papers (2021-02-28T15:57:42Z)
Optimal Approximation Rate of ReLU Networks in terms of Width and Depth [5.37133760455631]
This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved that ReLU networks with width $mathcalObig(maxdlfloor N1/drfloor,, N+2big)$ and depth $mathcalO(L)$ can approximate a H"older continuous function on $[0,1]d$ with an approximation rate $mathcalObig(lambdasqrtd (N2L2ln
arXiv Detail & Related papers (2021-02-28T13:15:55Z)
Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth [4.468952886990851]
A new network with super approximation power is introduced. This network is built with Floor ($lfloor xrfloor$) or ReLU ($max0,x$) activation function in each neuron.
arXiv Detail & Related papers (2020-06-22T13:27:33Z)
On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model. Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.