Minimum width for universal approximation using ReLU networks on compact
domain
- URL: http://arxiv.org/abs/2309.10402v2
- Date: Tue, 5 Mar 2024 06:55:28 GMT
- Title: Minimum width for universal approximation using ReLU networks on compact
domain
- Authors: Namjun Kim, Chanho Min, Sejun Park
- Abstract summary: We show that the minimum width for $Lp$ approximation of $Lp$ functions is exactly $maxd_x,d_y,2$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus)
Compared to the known result for ReLU networks, $w_min=maxd_x+1,d_y$ when the domain is $smashmathbb Rd_x$, our result first shows that approximation on a compact domain requires smaller width than on
- Score: 8.839687029212673
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: It has been shown that deep neural networks of a large enough width are
universal approximators but they are not if the width is too small. There were
several attempts to characterize the minimum width $w_{\min}$ enabling the
universal approximation property; however, only a few of them found the exact
values. In this work, we show that the minimum width for $L^p$ approximation of
$L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$ is exactly
$\max\{d_x,d_y,2\}$ if an activation function is ReLU-Like (e.g., ReLU, GELU,
Softplus). Compared to the known result for ReLU networks,
$w_{\min}=\max\{d_x+1,d_y\}$ when the domain is $\smash{\mathbb R^{d_x}}$, our
result first shows that approximation on a compact domain requires smaller
width than on $\smash{\mathbb R^{d_x}}$. We next prove a lower bound on
$w_{\min}$ for uniform approximation using general activation functions
including ReLU: $w_{\min}\ge d_y+1$ if $d_x<d_y\le2d_x$. Together with our
first result, this shows a dichotomy between $L^p$ and uniform approximations
for general activation functions and input/output dimensions.
Related papers
- Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal
Approximation [10.249623880822055]
This paper examines a uniform UAP for the function class $C(K,mathbbRd_y)$.
It gives the exact minimum width of the leaky-ReLU NN as $w_min=max(d_x,d_y)+Delta (d_x, d_y)$.
arXiv Detail & Related papers (2023-05-29T06:51:16Z) - Achieve the Minimum Width of Neural Networks for Universal Approximation [1.52292571922932]
We study the exact minimum width, $w_min$, for the universal approximation property (UAP) of neural networks.
In particular, the critical width, $w*_min$, for $Lp$-UAP can be achieved by leaky-ReLU networks.
arXiv Detail & Related papers (2022-09-23T04:03:50Z) - Learning a Single Neuron with Adversarial Label Noise via Gradient
Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations.
The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z) - TURF: A Two-factor, Universal, Robust, Fast Distribution Learning
Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$.
We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z) - Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector.
We show that this dependence on $d$ is optimal, up to logarithmic factors.
We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z) - On minimal representations of shallow ReLU networks [0.0]
We show that the minimal representation for $f$ uses either $n$, $n+1$ or $n+2$ neurons.
In particular, where the input layer is one-dimensional, minimal representations always use at most $n+1$ neurons but in all higher dimensional settings there are functions for which $n+2$ neurons are needed.
arXiv Detail & Related papers (2021-08-12T10:22:24Z) - Optimal Approximation Rate of ReLU Networks in terms of Width and Depth [5.37133760455631]
This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth.
It is proved that ReLU networks with width $mathcalObig(maxdlfloor N1/drfloor,, N+2big)$ and depth $mathcalO(L)$ can approximate a H"older continuous function on $[0,1]d$ with an approximation rate $mathcalObig(lambdasqrtd (N2L2ln
arXiv Detail & Related papers (2021-02-28T13:15:55Z) - Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov
Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model.
We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs.
To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z) - Deep Network with Approximation Error Being Reciprocal of Width to Power
of Square Root of Depth [4.468952886990851]
A new network with super approximation power is introduced.
This network is built with Floor ($lfloor xrfloor$) or ReLU ($max0,x$) activation function in each neuron.
arXiv Detail & Related papers (2020-06-22T13:27:33Z) - Minimum Width for Universal Approximation [91.02689252671291]
We prove that the minimum width required for the universal approximation of the $Lp$ functions is exactly $maxd_x+1,d_y$.
We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function.
arXiv Detail & Related papers (2020-06-16T01:24:21Z) - On the Complexity of Minimizing Convex Finite Sums Without Using the
Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model.
Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.