Related papers: Minimum Width of Deep Narrow Networks for Universal Approximation

Minimum Width of Deep Narrow Networks for Universal Approximation

URL: http://arxiv.org/abs/2511.06837v1
Date: Mon, 10 Nov 2025 08:29:14 GMT
Title: Minimum Width of Deep Narrow Networks for Universal Approximation
Authors: Xiao-Song Yang, Qi Zhou, Xuan Zhou,
Abstract summary: We study the lower bounds and upper bounds of the minimum width required for fully connected neural networks.<n>We present a new proof of the inequality $w_minge d_y+mathbf1_d_xd_yleq2d_x$ by constructing a more intuitive example.
Score: 9.00733527455972
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Determining the minimum width of fully connected neural networks has become a fundamental problem in recent theoretical studies of deep neural networks. In this paper, we study the lower bounds and upper bounds of the minimum width required for fully connected neural networks in order to have universal approximation capability, which is important in network design and training. We show that $w_{min}\leq\max(2d_x+1, d_y)$ for networks with ELU, SELU, and the upper bound of this inequality is attained when $d_y=2d_x$, where $d_x$, $d_y$ denote the input and output dimensions, respectively. Besides, we show that $d_x+1\leq w_{min}\leq d_x+d_y$ for networks with LeakyReLU, ELU, CELU, SELU, Softplus, by proving that ReLU can be approximated by these activation functions. In addition, in the case that the activation function is injective or can be uniformly approximated by a sequence of injective functions (e.g., ReLU), we present a new proof of the inequality $w_{min}\ge d_y+\mathbf{1}_{d_x<d_y\leq2d_x}$ by constructing a more intuitive example via a new geometric approach based on Poincar$\acute{\text{e}}$-Miranda Theorem.

Related papers

A Depth Hierarchy for Computing the Maximum in ReLU Networks via Extremal Graph Theory [3.0120086446979877]
We consider the problem of exact computation of the maximum function over $d$ real inputs using ReLU neural networks.<n>We show that a sufficiently narrow network cannot capture the non-linearities of the maximum.<n>This suggests that despite its simple nature, the maximum function possesses an inherent complexity.
arXiv Detail & Related papers (2026-01-04T07:40:42Z)
New advances in universal approximation with neural networks of minimal width [4.424170214926035]
We show that autoencoders with leaky ReLU activations are universal approximators of $Lp$ functions. We broaden our results to show that smooth invertible neural networks can approximate $Lp(mathbbRd,mathbbRd)$ on compacta.
arXiv Detail & Related papers (2024-11-13T16:17:16Z)
Neural Networks and (Virtual) Extended Formulations [8.185918509343818]
We prove lower bounds on the size of neural networks that optimize over $P$.<n>We show that $mathrmxc(P)$ is a lower bound on the size of any monotone or input neural network that solves the linear optimization problem over $P$.
arXiv Detail & Related papers (2024-11-05T11:12:11Z)
Learning Hierarchical Polynomials with Three-Layer Neural Networks [56.71223169861528]
We study the problem of learning hierarchical functions over the standard Gaussian distribution with three-layer neural networks. For a large subclass of degree $k$s $p$, a three-layer neural network trained via layerwise gradientp descent on the square loss learns the target $h$ up to vanishing test error. This work demonstrates the ability of three-layer neural networks to learn complex features and as a result, learn a broad class of hierarchical functions.
arXiv Detail & Related papers (2023-11-23T02:19:32Z)
Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation [10.249623880822055]
This paper examines a uniform UAP for the function class $C(K,mathbbRd_y)$. It gives the exact minimum width of the leaky-ReLU NN as $w_min=max(d_x,d_y)+Delta (d_x, d_y)$.
arXiv Detail & Related papers (2023-05-29T06:51:16Z)
Achieve the Minimum Width of Neural Networks for Universal Approximation [1.52292571922932]
We study the exact minimum width, $w_min$, for the universal approximation property (UAP) of neural networks. In particular, the critical width, $w*_min$, for $Lp$-UAP can be achieved by leaky-ReLU networks.
arXiv Detail & Related papers (2022-09-23T04:03:50Z)
Shallow neural network representation of polynomials [91.3755431537592]
We show that $d$-variables of degreeR$ can be represented on $[0,1]d$ as shallow neural networks of width $d+1+sum_r=2Rbinomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1[binomr+d-1d-1d-1d-1[binomr+d-1d-1d-1d-1
arXiv Detail & Related papers (2022-08-17T08:14:52Z)
Minimax Optimal Quantization of Linear Models: Information-Theoretic Limits and Efficient Algorithms [59.724977092582535]
We consider the problem of quantizing a linear model learned from measurements. We derive an information-theoretic lower bound for the minimax risk under this setting. We show that our method and upper-bounds can be extended for two-layer ReLU neural networks.
arXiv Detail & Related papers (2022-02-23T02:39:04Z)
Size and Depth Separation in Approximating Natural Functions with Neural Networks [52.73592689730044]
We show the benefits of size and depth for approximation of natural functions with ReLU networks. We show a complexity-theoretic barrier to proving such results beyond size $O(d)$. We also show an explicit natural function, that can be approximated with networks of size $O(d)$.
arXiv Detail & Related papers (2021-01-30T21:30:11Z)
Minimum Width for Universal Approximation [91.02689252671291]
We prove that the minimum width required for the universal approximation of the $Lp$ functions is exactly $maxd_x+1,d_y$. We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function.
arXiv Detail & Related papers (2020-06-16T01:24:21Z)
On the Modularity of Hypernetworks [103.1147622394852]
We show that for a structured target function, the overall number of trainable parameters in a hypernetwork is smaller by orders of magnitude than the number of trainable parameters of a standard neural network and an embedding method.
arXiv Detail & Related papers (2020-02-23T22:51:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.