Minimal Width for Universal Property of Deep RNN
- URL: http://arxiv.org/abs/2211.13866v2
- Date: Wed, 29 Mar 2023 02:25:24 GMT
- Title: Minimal Width for Universal Property of Deep RNN
- Authors: Chang hoon Song, Geonho Hwang, Jun ho Lee, Myungjoo Kang
- Abstract summary: A recurrent neural network (RNN) is a widely used deep-learning network for dealing with sequential data.
We prove the universality of deep narrow RNNs and show that the upper bound of the minimum width for universality can be independent of the length of the data.
- Score: 6.744583770038476
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: A recurrent neural network (RNN) is a widely used deep-learning network for
dealing with sequential data. Imitating a dynamical system, an infinite-width
RNN can approximate any open dynamical system in a compact domain. In general,
deep networks with bounded widths are more effective than wide networks in
practice; however, the universal approximation theorem for deep narrow
structures has yet to be extensively studied. In this study, we prove the
universality of deep narrow RNNs and show that the upper bound of the minimum
width for universality can be independent of the length of the data.
Specifically, we show that a deep RNN with ReLU activation can approximate any
continuous function or $L^p$ function with the widths $d_x+d_y+2$ and
$\max\{d_x+1,d_y\}$, respectively, where the target function maps a finite
sequence of vectors in $\mathbb{R}^{d_x}$ to a finite sequence of vectors in
$\mathbb{R}^{d_y}$. We also compute the additional width required if the
activation function is $\tanh$ or more. In addition, we prove the universality
of other recurrent networks, such as bidirectional RNNs. Bridging a multi-layer
perceptron and an RNN, our theory and proof technique can be an initial step
toward further research on deep RNNs.
Related papers
- Deep Neural Networks: Multi-Classification and Universal Approximation [0.0]
We demonstrate that a ReLU deep neural network with a width of $2$ and a depth of $2N+4M-1$ layers can achieve finite sample memorization for any dataset comprising $N$ elements.
We also provide depth estimates for approximating $W1,p$ functions and width estimates for approximating $Lp(Omega;mathbbRm)$ for $mgeq1$.
arXiv Detail & Related papers (2024-09-10T14:31:21Z) - Polynomial Width is Sufficient for Set Representation with
High-dimensional Features [69.65698500919869]
DeepSets is the most widely used neural network architecture for set representation.
We present two set-element embedding layers: (a) linear + power activation (LP) and (b) linear + exponential activations (LE)
arXiv Detail & Related papers (2023-07-08T16:00:59Z) - Universal approximation with complex-valued deep narrow neural networks [0.0]
We show that deep narrow complex-valued networks are universal if and only if their activation function is neither holomorphic, nor antiholomorphic, nor $mathbbR$-affine.
We prove, however, that a width of $n+m+4$ suffices for a rich subclass of the admissible activation functions.
arXiv Detail & Related papers (2023-05-26T13:22:14Z) - The Onset of Variance-Limited Behavior for Networks in the Lazy and Rich
Regimes [75.59720049837459]
We study the transition from infinite-width behavior to this variance limited regime as a function of sample size $P$ and network width $N$.
We find that finite-size effects can become relevant for very small datasets on the order of $P* sim sqrtN$ for regression with ReLU networks.
arXiv Detail & Related papers (2022-12-23T04:48:04Z) - Achieve the Minimum Width of Neural Networks for Universal Approximation [1.52292571922932]
We study the exact minimum width, $w_min$, for the universal approximation property (UAP) of neural networks.
In particular, the critical width, $w*_min$, for $Lp$-UAP can be achieved by leaky-ReLU networks.
arXiv Detail & Related papers (2022-09-23T04:03:50Z) - Understanding Deep Neural Function Approximation in Reinforcement
Learning via $\epsilon$-Greedy Exploration [53.90873926758026]
This paper provides a theoretical study of deep neural function approximation in reinforcement learning (RL)
We focus on the value based algorithm with the $epsilon$-greedy exploration via deep (and two-layer) neural networks endowed by Besov (and Barron) function spaces.
Our analysis reformulates the temporal difference error in an $L2(mathrmdmu)$-integrable space over a certain averaged measure $mu$, and transforms it to a generalization problem under the non-iid setting.
arXiv Detail & Related papers (2022-09-15T15:42:47Z) - Neural Network Approximation of Continuous Functions in High Dimensions
with Applications to Inverse Problems [6.84380898679299]
Current theory predicts that networks should scale exponentially in the dimension of the problem.
We provide a general method for bounding the complexity required for a neural network to approximate a H"older (or uniformly) continuous function.
arXiv Detail & Related papers (2022-08-28T22:44:07Z) - A Convergence Theory Towards Practical Over-parameterized Deep Neural
Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time.
We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both.
Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z) - Approximating smooth functions by deep neural networks with sigmoid
activation function [0.0]
We study the power of deep neural networks (DNNs) with sigmoid activation function.
We show that DNNs with fixed depth and a width of order $Md$ achieve an approximation rate of $M-2p$.
arXiv Detail & Related papers (2020-10-08T07:29:31Z) - Interval Universal Approximation for Neural Networks [47.767793120249095]
We introduce the interval universal approximation (IUA) theorem.
IUA shows that neural networks can approximate any continuous function $f$ as we have known for decades.
We study the computational complexity of constructing neural networks that are amenable to precise interval analysis.
arXiv Detail & Related papers (2020-07-12T20:43:56Z) - Minimum Width for Universal Approximation [91.02689252671291]
We prove that the minimum width required for the universal approximation of the $Lp$ functions is exactly $maxd_x+1,d_y$.
We also prove that the same conclusion does not hold for the uniform approximation with ReLU, but does hold with an additional threshold activation function.
arXiv Detail & Related papers (2020-06-16T01:24:21Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.