Related papers: Minimum width for universal approximation using ReLU networks on compact domain

Minimum width for universal approximation using ReLU networks on compact domain

URL: http://arxiv.org/abs/2309.10402v2
Date: Tue, 5 Mar 2024 06:55:28 GMT
Title: Minimum width for universal approximation using ReLU networks on compact domain
Authors: Namjun Kim, Chanho Min, Sejun Park
Abstract summary: We show that the minimum width for $Lp$ approximation of $Lp$ functions is exactly $maxd_x,d_y,2$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus) Compared to the known result for ReLU networks, $w_min=maxd_x+1,d_y$ when the domain is $smashmathbb Rd_x$, our result first shows that approximation on a compact domain requires smaller width than on
Score: 8.839687029212673
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: It has been shown that deep neural networks of a large enough width are universal approximators but they are not if the width is too small. There were several attempts to characterize the minimum width $w_{\min}$ enabling the universal approximation property; however, only a few of them found the exact values. In this work, we show that the minimum width for $L^p$ approximation of $L^p$ functions from $[0,1]^{d_x}$ to $\mathbb R^{d_y}$ is exactly $\max\{d_x,d_y,2\}$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus). Compared to the known result for ReLU networks, $w_{\min}=\max\{d_x+1,d_y\}$ when the domain is $\smash{\mathbb R^{d_x}}$, our result first shows that approximation on a compact domain requires smaller width than on $\smash{\mathbb R^{d_x}}$. We next prove a lower bound on $w_{\min}$ for uniform approximation using general activation functions including ReLU: $w_{\min}\ge d_y+1$ if $d_x<d_y\le2d_x$. Together with our first result, this shows a dichotomy between $L^p$ and uniform approximations for general activation functions and input/output dimensions.

Related papers

Sharp Gap-Dependent Variance-Aware Regret Bounds for Tabular MDPs [54.28273395444243]
We show that the Monotonic Value Omega (MVP) algorithm achieves a variance-aware gap-dependent regret bound of $$tildeOleft(left(sum_Delta_h(s,a)>0 fracH2 log K land mathttVar_maxtextc$.
arXiv Detail & Related papers (2025-06-06T20:33:57Z)
Guessing Efficiently for Constrained Subspace Approximation [49.83981776254246]
We introduce a general framework for constrained subspace approximation.<n>We show it provides new algorithms for partition-constrained subspace approximation with applications to $k$-means clustering, and projected non-negative matrix factorization.
arXiv Detail & Related papers (2025-04-29T15:56:48Z)
Minimum width for universal approximation using squashable activation functions [9.418401219498223]
We study the minimum width of networks using general activation functions. We show that for networks using a squashable activation function to universally approximate $Lp$ functions, the minimum width is $maxd_x,d_y,2$ unless $d_x=d_y=1$.
arXiv Detail & Related papers (2025-04-10T01:23:24Z)
New advances in universal approximation with neural networks of minimal width [4.424170214926035]
We show that autoencoders with leaky ReLU activations are universal approximators of $Lp$ functions. We broaden our results to show that smooth invertible neural networks can approximate $Lp(mathbbRd,mathbbRd)$ on compacta.
arXiv Detail & Related papers (2024-11-13T16:17:16Z)
Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation [10.249623880822055]
This paper examines a uniform UAP for the function class $C(K,mathbbRd_y)$. It gives the exact minimum width of the leaky-ReLU NN as $w_min=max(d_x,d_y)+Delta (d_x, d_y)$.
arXiv Detail & Related papers (2023-05-29T06:51:16Z)
Achieve the Minimum Width of Neural Networks for Universal Approximation [1.52292571922932]
We study the exact minimum width, $w_min$, for the universal approximation property (UAP) of neural networks. In particular, the critical width, $w*_min$, for $Lp$-UAP can be achieved by leaky-ReLU networks.
arXiv Detail & Related papers (2022-09-23T04:03:50Z)
Learning a Single Neuron with Adversarial Label Noise via Gradient Descent [50.659479930171585]
We study a function of the form $mathbfxmapstosigma(mathbfwcdotmathbfx)$ for monotone activations. The goal of the learner is to output a hypothesis vector $mathbfw$ that $F(mathbbw)=C, epsilon$ with high probability.
arXiv Detail & Related papers (2022-06-17T17:55:43Z)
TURF: A Two-factor, Universal, Robust, Fast Distribution Learning Algorithm [64.13217062232874]
One of its most powerful and successful modalities approximates every distribution to an $ell$ distance essentially at most a constant times larger than its closest $t$-piece degree-$d_$. We provide a method that estimates this number near-optimally, hence helps approach the best possible approximation.
arXiv Detail & Related papers (2022-02-15T03:49:28Z)
Active Sampling for Linear Regression Beyond the $\ell_2$ Norm [70.49273459706546]
We study active sampling algorithms for linear regression, which aim to query only a small number of entries of a target vector. We show that this dependence on $d$ is optimal, up to logarithmic factors. We also provide the first total sensitivity upper bound $O(dmax1,p/2log2 n)$ for loss functions with at most degree $p$ growth.
arXiv Detail & Related papers (2021-11-09T00:20:01Z)
Optimal Approximation Rate of ReLU Networks in terms of Width and Depth [5.37133760455631]
This paper concentrates on the approximation power of deep feed-forward neural networks in terms of width and depth. It is proved that ReLU networks with width $mathcalObig(maxdlfloor N1/drfloor,, N+2big)$ and depth $mathcalO(L)$ can approximate a H"older continuous function on $[0,1]d$ with an approximation rate $mathcalObig(lambdasqrtd (N2L2ln
arXiv Detail & Related papers (2021-02-28T13:15:55Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model. We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs. To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z)
Deep Network with Approximation Error Being Reciprocal of Width to Power of Square Root of Depth [4.468952886990851]
A new network with super approximation power is introduced. This network is built with Floor ($lfloor xrfloor$) or ReLU ($max0,x$) activation function in each neuron.
arXiv Detail & Related papers (2020-06-22T13:27:33Z)
Minimum Width for Universal Approximation [91.02689252671291]
We prove that the minimum width required for the universal approximation of the $Lp$ functions is exactly $maxd_x+1,d_y$. We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function.
arXiv Detail & Related papers (2020-06-16T01:24:21Z)
On the Complexity of Minimizing Convex Finite Sums Without Using the Indices of the Individual Functions [62.01594253618911]
We exploit the finite noise structure of finite sums to derive a matching $O(n2)$-upper bound under the global oracle model. Following a similar approach, we propose a novel adaptation of SVRG which is both emphcompatible with oracles, and achieves complexity bounds of $tildeO(n2+nsqrtL/mu)log (1/epsilon)$ and $O(nsqrtL/epsilon)$, for $mu>0$ and $mu=0$
arXiv Detail & Related papers (2020-02-09T03:39:46Z)

This list is automatically generated from the titles and abstracts of the papers in this site.