Related papers: Minimum Width for Deep, Narrow MLP: A Diffeomorphism Approach

Minimum Width for Deep, Narrow MLP: A Diffeomorphism Approach

URL: http://arxiv.org/abs/2308.15873v2
Date: Tue, 7 Nov 2023 11:18:44 GMT
Title: Minimum Width for Deep, Narrow MLP: A Diffeomorphism Approach
Authors: Geonho Hwang
Abstract summary: We propose a framework that simplifies finding the minimum width for deep, narrows into determining a purely geometrical function denoted as $w(d_x, d_y)$. We provide an upper bound for the minimum width, given by $namemax (2d_x+1, d_y) + alpha(sigma)$, where $0 leq alpha(sigma) leq 2$ represents a constant depending on the activation function.
Score: 3.218087085276242
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recently, there has been a growing focus on determining the minimum width requirements for achieving the universal approximation property in deep, narrow Multi-Layer Perceptrons (MLPs). Among these challenges, one particularly challenging task is approximating a continuous function under the uniform norm, as indicated by the significant disparity between its lower and upper bounds. To address this problem, we propose a framework that simplifies finding the minimum width for deep, narrow MLPs into determining a purely geometrical function denoted as $w(d_x, d_y)$. This function relies solely on the input and output dimensions, represented as $d_x$ and $d_y$, respectively. Two key steps support this framework. First, we demonstrate that deep, narrow MLPs, when provided with a small additional width, can approximate a $C^2$-diffeomorphism. Subsequently, using this result, we prove that $w(d_x, d_y)$ equates to the optimal minimum width required for deep, narrow MLPs to achieve universality. By employing the aforementioned framework and the Whitney embedding theorem, we provide an upper bound for the minimum width, given by $\operatorname{max}(2d_x+1, d_y) + \alpha(\sigma)$, where $0 \leq \alpha(\sigma) \leq 2$ represents a constant depending on the activation function. Furthermore, we provide a lower bound of $4$ for the minimum width in cases where the input and output dimensions are both equal to two.

Related papers

Guessing Efficiently for Constrained Subspace Approximation [49.83981776254246]
We introduce a general framework for constrained subspace approximation. We show it provides new algorithms for partition-constrained subspace approximation with applications to $k$-means clustering, and projected non-negative matrix factorization.
arXiv Detail & Related papers (2025-04-29T15:56:48Z)
Minimum width for universal approximation using squashable activation functions [9.418401219498223]
We study the minimum width of networks using general activation functions. We show that for networks using a squashable activation function to universally approximate $Lp$ functions, the minimum width is $maxd_x,d_y,2$ unless $d_x=d_y=1$.
arXiv Detail & Related papers (2025-04-10T01:23:24Z)
New advances in universal approximation with neural networks of minimal width [4.424170214926035]
We show that autoencoders with leaky ReLU activations are universal approximators of $Lp$ functions. We broaden our results to show that smooth invertible neural networks can approximate $Lp(mathbbRd,mathbbRd)$ on compacta.
arXiv Detail & Related papers (2024-11-13T16:17:16Z)
Fourier Sliced-Wasserstein Embedding for Multisets and Measures [3.396731589928944]
We present a novel method to embed multisets and measures over $mathbbRd$ into Euclidean space. We show that our method yields superior representations of input multisets and offers practical advantage for learning on multiset data.
arXiv Detail & Related papers (2024-05-26T11:04:41Z)
Minimum width for universal approximation using ReLU networks on compact domain [8.839687029212673]
We show that the minimum width for $Lp$ approximation of $Lp$ functions is exactly $maxd_x,d_y,2$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus) Compared to the known result for ReLU networks, $w_min=maxd_x+1,d_y$ when the domain is $smashmathbb Rd_x$, our result first shows that approximation on a compact domain requires smaller width than on
arXiv Detail & Related papers (2023-09-19T08:04:48Z)
Polynomial Width is Sufficient for Set Representation with High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation. We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z)
Minimum Width of Leaky-ReLU Neural Networks for Uniform Universal Approximation [10.249623880822055]
This paper examines a uniform UAP for the function class $C(K,mathbbRd_y)$. It gives the exact minimum width of the leaky-ReLU NN as $w_min=max(d_x,d_y)+Delta (d_x, d_y)$.
arXiv Detail & Related papers (2023-05-29T06:51:16Z)
Pseudonorm Approachability and Applications to Regret Minimization [73.54127663296906]
We convert high-dimensional $ell_infty$-approachability problems to low-dimensional pseudonorm approachability problems. We develop an algorithmic theory of pseudonorm approachability, analogous to previous work on approachability for $ell$ and other norms.
arXiv Detail & Related papers (2023-02-03T03:19:14Z)
Estimating the minimizer and the minimum value of a regression function under passive design [72.85024381807466]
We propose a new method for estimating the minimizer $boldsymbolx*$ and the minimum value $f*$ of a smooth and strongly convex regression function $f$. We derive non-asymptotic upper bounds for the quadratic risk and optimization error of $boldsymbolz_n$, and for the risk of estimating $f*$.
arXiv Detail & Related papers (2022-11-29T18:38:40Z)
On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation [20.70453775428433]
We investigate the sample complexity of bounded two-layer neural networks using different activation functions. We prove that if $sigma$ is element-wise, then the sample complexity of $mathcalH$ has only logarithmic dependency in width.
arXiv Detail & Related papers (2022-11-17T16:27:15Z)
Best Policy Identification in Linear MDPs [70.57916977441262]
We investigate the problem of best identification in discounted linear Markov+Delta Decision in the fixed confidence setting under a generative model. The lower bound as the solution of an intricate non- optimization program can be used as the starting point to devise such algorithms.
arXiv Detail & Related papers (2022-08-11T04:12:50Z)
Approximation Algorithms for ROUND-UFP and ROUND-SAP [0.06875312133832077]
We study ROUND-UFP and ROUND-SAP, two generalizations of the classical PACKING problem. In ROUND-UFP, the goal is to find a packing of all rectangles into a minimum number of copies (rounds) of the given path. In ROUND-SAP, the tasks are considered to be rectangles and the goal is to find a non-overlapping packing of these rectangles into a minimum number of rounds.
arXiv Detail & Related papers (2022-02-07T20:15:15Z)
Nearly Minimax Optimal Reinforcement Learning for Linear Mixture Markov Decision Processes [91.38793800392108]
We study reinforcement learning with linear function approximation where the underlying transition probability kernel of the Markov decision process (MDP) is a linear mixture model. We propose a new, computationally efficient algorithm with linear function approximation named $textUCRL-VTR+$ for the aforementioned linear mixture MDPs. To the best of our knowledge, these are the first computationally efficient, nearly minimax optimal algorithms for RL with linear function approximation.
arXiv Detail & Related papers (2020-12-15T18:56:46Z)
Minimum Width for Universal Approximation [91.02689252671291]
We prove that the minimum width required for the universal approximation of the $Lp$ functions is exactly $maxd_x+1,d_y$. We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function.
arXiv Detail & Related papers (2020-06-16T01:24:21Z)
On Suboptimality of Least Squares with Application to Estimation of Convex Bodies [74.39616164169131]
We settle an open problem regarding optimality of Least Squares in estimating a convex set from noisy support function measurements in dimension $dgeq 6$. We establish that Least Squares is sub-optimal, and achieves a rate of $tildeTheta_d(n-2/(d-1))$ whereas the minimax rate is $Theta_d(n-4/(d+3))$.
arXiv Detail & Related papers (2020-06-07T05:19:00Z)

This list is automatically generated from the titles and abstracts of the papers in this site.