Related papers: Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation

Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation

URL: http://arxiv.org/abs/2305.18460v3
Date: Thu, 1 Feb 2024 03:36:36 GMT
Title: Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation
Authors: Li'ang Li, Yifei Duan, Guanghua Ji, Yongqiang Cai
Abstract summary: This paper examines a uniform UAP for the function class $C(K,mathbbRd_y)$. It gives the exact minimum width of the leaky-ReLU NN as $w_min=max(d_x,d_y)+Delta (d_x, d_y)$.
Score: 10.249623880822055
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: The study of universal approximation properties (UAP) for neural networks (NN) has a long history. When the network width is unlimited, only a single hidden layer is sufficient for UAP. In contrast, when the depth is unlimited, the width for UAP needs to be not less than the critical width $w^*_{\min}=\max(d_x,d_y)$, where $d_x$ and $d_y$ are the dimensions of the input and output, respectively. Recently, \cite{cai2022achieve} shows that a leaky-ReLU NN with this critical width can achieve UAP for $L^p$ functions on a compact domain ${K}$, \emph{i.e.,} the UAP for $L^p({K},\mathbb{R}^{d_y})$. This paper examines a uniform UAP for the function class $C({K},\mathbb{R}^{d_y})$ and gives the exact minimum width of the leaky-ReLU NN as $w_{\min}=\max(d_x,d_y)+\Delta (d_x, d_y)$, where $\Delta (d_x, d_y)$ is the additional dimensions for approximating continuous functions with diffeomorphisms via embedding. To obtain this result, we propose a novel lift-flow-discretization approach that shows that the uniform UAP has a deep connection with topological theory.

Related papers

New advances in universal approximation with neural networks of minimal width [4.424170214926035]
We show that autoencoders with leaky ReLU activations are universal approximators of $Lp$ functions. We broaden our results to show that smooth invertible neural networks can approximate $Lp(mathbbRd,mathbbRd)$ on compacta.
arXiv Detail & Related papers (2024-11-13T16:17:16Z)
Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements. We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z)
Neural network learns low-dimensional polynomials with SGD near the information-theoretic limit [75.4661041626338]
We study the problem of gradient descent learning of a single-index target function $f_*(boldsymbolx) = textstylesigma_*left(langleboldsymbolx,boldsymbolthetarangleright)$ We prove that a two-layer neural network optimized by an SGD-based algorithm learns $f_*$ with a complexity that is not governed by information exponents.
arXiv Detail & Related papers (2024-06-03T17:56:58Z)
A Minimal Control Family of Dynamical Syetem for Universal Approximation [6.164223149261533]
We show that a control family can generate flow maps that can approximate diffeomorphisms of $mathbbRd$ on any compact domain. Our result reveals an underlying connection between the approximation power of neural networks and control systems.
arXiv Detail & Related papers (2023-12-20T10:36:55Z)
Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks. For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z)
Minimum width for universal approximation using ReLU networks on compact domain [8.839687029212673]
We show that the minimum width for $Lp$ approximation of $Lp$ functions is exactly $maxd_x,d_y,2$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus) Compared to the known result for ReLU networks, $w_min=maxd_x+1,d_y$ when the domain is $smashmathbb Rd_x$, our result first shows that approximation on a compact domain requires smaller width than on
arXiv Detail & Related papers (2023-09-19T08:04:48Z)
Polynomial Width is Sufficient for Set Representation with High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation. We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z)
Depth Dependence of $\mu$P Learning Rates in ReLU MLPs [72.14317069090407]
We study the dependence on $n$ and $L$ of the maximal update ($mu$P) learning rate. We find that it has a non-trivial dependence of $L$, scaling like $L-3/2.$
arXiv Detail & Related papers (2023-05-13T01:10:49Z)
Achieve the Minimum Width of Neural Networks for Universal Approximation [1.52292571922932]
We study the exact minimum width, $w_min$, for the universal approximation property (UAP) of neural networks. In particular, the critical width, $w*_min$, for $Lp$-UAP can be achieved by leaky-ReLU networks.
arXiv Detail & Related papers (2022-09-23T04:03:50Z)
A deep network construction that adapts to intrinsic dimensionality beyond the domain [79.23797234241471]
We study the approximation of two-layer compositions $f(x) = g(phi(x))$ via deep networks with ReLU activation. We focus on two intuitive and practically relevant choices for $phi$: the projection onto a low-dimensional embedded submanifold and a distance to a collection of low-dimensional sets.
arXiv Detail & Related papers (2020-08-06T09:50:29Z)
Minimum Width for Universal Approximation [91.02689252671291]
We prove that the minimum width required for the universal approximation of the $Lp$ functions is exactly $maxd_x+1,d_y$. We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function.
arXiv Detail & Related papers (2020-06-16T01:24:21Z)
On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)
A closer look at the approximation capabilities of neural networks [6.09170287691728]
A feedforward neural network with one hidden layer is able to approximate any continuous function $f$ to any given approximation threshold $varepsilon$. We show that this uniform approximation property still holds even under seemingly strong conditions imposed on the weights.
arXiv Detail & Related papers (2020-02-16T04:58:43Z)

This list is automatically generated from the titles and abstracts of the papers in this site.