Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal
Approximation
- URL: http://arxiv.org/abs/2305.18460v3
- Date: Thu, 1 Feb 2024 03:36:36 GMT
- Title: Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal
Approximation
- Authors: Li'ang Li, Yifei Duan, Guanghua Ji, Yongqiang Cai
- Abstract summary: This paper examines a uniform UAP for the function class $C(K,mathbbRd_y)$.
It gives the exact minimum width of the leaky-ReLU NN as $w_min=max(d_x,d_y)+Delta (d_x, d_y)$.
- Score: 10.249623880822055
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The study of universal approximation properties (UAP) for neural networks
(NN) has a long history. When the network width is unlimited, only a single
hidden layer is sufficient for UAP. In contrast, when the depth is unlimited,
the width for UAP needs to be not less than the critical width
$w^*_{\min}=\max(d_x,d_y)$, where $d_x$ and $d_y$ are the dimensions of the
input and output, respectively. Recently, \cite{cai2022achieve} shows that a
leaky-ReLU NN with this critical width can achieve UAP for $L^p$ functions on a
compact domain ${K}$, \emph{i.e.,} the UAP for $L^p({K},\mathbb{R}^{d_y})$.
This paper examines a uniform UAP for the function class
$C({K},\mathbb{R}^{d_y})$ and gives the exact minimum width of the leaky-ReLU
NN as $w_{\min}=\max(d_x,d_y)+\Delta (d_x, d_y)$, where $\Delta (d_x, d_y)$ is
the additional dimensions for approximating continuous functions with
diffeomorphisms via embedding. To obtain this result, we propose a novel
lift-flow-discretization approach that shows that the uniform UAP has a deep
connection with topological theory.
Related papers
- Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - A Minimal Control Family of Dynamical Syetem for Universal Approximation [6.164223149261533]
We show that a control family can generate flow maps that can approximate diffeomorphisms of $mathbbRd$ on any compact domain.
Our result reveals an underlying connection between the approximation power of neural networks and control systems.
arXiv Detail & Related papers (2023-12-20T10:36:55Z) - Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks.
For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error.
This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z) - Minimum width for universal approximation using ReLU networks on compact
domain [8.839687029212673]
We show that the minimum width for $Lp$ approximation of $Lp$ functions is exactly $maxd_x,d_y,2$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus)
Compared to the known result for ReLU networks, $w_min=maxd_x+1,d_y$ when the domain is $smashmathbb Rd_x$, our result first shows that approximation on a compact domain requires smaller width than on
arXiv Detail & Related papers (2023-09-19T08:04:48Z) - Polynomial Width is Sufficient for Set Representation with
High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation.
We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z) - Depth Dependence of $\mu$P Learning Rates in ReLU MLPs [72.14317069090407]
We study the dependence on $n$ and $L$ of the maximal update ($mu$P) learning rate.
We find that it has a non-trivial dependence of $L$, scaling like $L-3/2.$
arXiv Detail & Related papers (2023-05-13T01:10:49Z) - Achieve the Minimum Width of Neural Networks for Universal Approximation [1.52292571922932]
We study the exact minimum width, $w_min$, for the universal approximation property (UAP) of neural networks.
In particular, the critical width, $w*_min$, for $Lp$-UAP can be achieved by leaky-ReLU networks.
arXiv Detail & Related papers (2022-09-23T04:03:50Z) - A deep network construction that adapts to intrinsic dimensionality
beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation.
We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z) - Minimum Width for Universal Approximation [91.02689252671291]
We prove that the minimum width required for the universal approximation of the $Lp$ functions is exactly $maxd_x+1,d_y$.
We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function.
arXiv Detail & Related papers (2020-06-16T01:24:21Z) - On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z) - A closer look at the approximation capabilities of neural networks [6.09170287691728]
A feedforward neural network with one hidden layer is able to approximate any continuous function $f$ to any given approximation threshold $varepsilon$.
We show that this uniform approximation property still holds even under seemingly strong conditions imposed on the weights.
arXiv Detail & Related papers (2020-02-16T04:58:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.