Minimum Width for Universal Approximation
- URL: http://arxiv.org/abs/2006.08859v1
- Date: Tue, 16 Jun 2020 01:24:21 GMT
- Title: Minimum Width for Universal Approximation
- Authors: Sejun Park, Chulhee Yun, Jaeho Lee, Jinwoo Shin
- Abstract summary: We prove that the minimum width required for the universal approximation of the $Lp$ functions is exactly $maxd_x+1,d_y$.
We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function.
- Score: 91.02689252671291
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: The universal approximation property of width-bounded networks has been
studied as a dual of classical universal approximation results on depth-bounded
networks. However, the critical width enabling the universal approximation has
not been exactly characterized in terms of the input dimension $d_x$ and the
output dimension $d_y$. In this work, we provide the first definitive result in
this direction for networks using the ReLU activation functions: The minimum
width required for the universal approximation of the $L^p$ functions is
exactly $\max\{d_x+1,d_y\}$. We also prove that the same conclusion does not
hold for the uniform approximation with ReLU, but does hold with an additional
threshold activation function. Our proof technique can be also used to derive a
tighter upper bound on the minimum width required for the universal
approximation using networks with general activation functions.
Related papers
- Minimum width for universal approximation using ReLU networks on compact
domain [8.839687029212673]
We show that the minimum width for $Lp$ approximation of $Lp$ functions is exactly $maxd_x,d_y,2$ if an activation function is ReLU-Like (e.g., ReLU, GELU, Softplus)
Compared to the known result for ReLU networks, $w_min=maxd_x+1,d_y$ when the domain is $smashmathbb Rd_x$, our result first shows that approximation on a compact domain requires smaller width than on
arXiv Detail & Related papers (2023-09-19T08:04:48Z) - Polynomial Width is Sufficient for Set Representation with
High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation.
We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z) - Universal approximation with complex-valued deep narrow neural networks [0.0]
We show that deep narrow complex-valued networks are universal if and only if their activation function is neither holomorphic, nor antiholomorphic, nor $mathbbR$-affine.
We prove, however, that a width of $n+m+4$ suffices for a rich subclass of the admissible activation functions.
arXiv Detail & Related papers (2023-05-26T13:22:14Z) - Minimal Width for Universal Property of Deep RNN [6.744583770038476]
A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data.
We prove the universality of deep narrow RNNs and show that the upper bound of the minimum width for universality can be independent of the length of the data.
arXiv Detail & Related papers (2022-11-25T02:43:54Z) - Achieve the Minimum Width of Neural Networks for Universal Approximation [1.52292571922932]
We study the exact minimum width, $w_min$, for the universal approximation property (UAP) of neural networks.
In particular, the critical width, $w*_min$, for $Lp$-UAP can be achieved by leaky-ReLU networks.
arXiv Detail & Related papers (2022-09-23T04:03:50Z) - Deep neural network approximation of analytic functions [91.3755431537592]
entropy bound for the spaces of neural networks with piecewise linear activation functions.
We derive an oracle inequality for the expected error of the considered penalized deep neural network estimators.
arXiv Detail & Related papers (2021-04-05T18:02:04Z) - Nearly Minimax Optimal Regret for Learning Infinite-horizon
Average-reward MDPs with Linear Function Approximation [95.80683238546499]
We propose a new algorithm UCRL2-VTR, which can be seen as an extension of the UCRL2 algorithm with linear function approximation.
We show that UCRL2-VTR with Bernstein-type bonus can achieve a regret of $tildeO(dsqrtDT)$, where $d$ is the dimension of the feature mapping.
We also prove a matching lower bound $tildeOmega(dsqrtDT)$, which suggests that the proposed UCRL2-VTR is minimax optimal up to logarithmic factors
arXiv Detail & Related papers (2021-02-15T02:08:39Z) - Quantitative Rates and Fundamental Obstructions to Non-Euclidean
Universal Approximation with Deep Narrow Feed-Forward Networks [3.8073142980733]
We quantify the number of narrow layers required for "deep geometric feed-forward neural networks"
We find that both the global and local universal approximation guarantees can only coincide when approximating null-homotopic functions.
arXiv Detail & Related papers (2021-01-13T23:29:40Z) - Interval Universal Approximation for Neural Networks [47.767793120249095]
We introduce the interval universal approximation (IUA) theorem.
IUA shows that neural networks can approximate any continuous function $f$ as we have known for decades.
We study the computational complexity of constructing neural networks that are amenable to precise interval analysis.
arXiv Detail & Related papers (2020-07-12T20:43:56Z) - Reinforcement Learning with General Value Function Approximation:
Provably Efficient Approach via Bounded Eluder Dimension [124.7752517531109]
We establish a provably efficient reinforcement learning algorithm with general value function approximation.
We show that our algorithm achieves a regret bound of $widetildeO(mathrmpoly(dH)sqrtT)$ where $d$ is a complexity measure.
Our theory generalizes recent progress on RL with linear value function approximation and does not make explicit assumptions on the model of the environment.
arXiv Detail & Related papers (2020-05-21T17:36:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.