Inverse Approximation Theory for Nonlinear Recurrent Neural Networks
- URL: http://arxiv.org/abs/2305.19190v4
- Date: Tue, 6 Feb 2024 13:19:57 GMT
- Title: Inverse Approximation Theory for Nonlinear Recurrent Neural Networks
- Authors: Shida Wang, Zhong Li and Qianxiao Li
- Abstract summary: We prove an inverse approximation theorem for the approximation of nonlinear sequence-to-sequence relationships using recurrent neural networks (RNNs)
We show that nonlinear sequence relationships that can be stably approximated by nonlinear RNNs must have an exponential decaying memory structure.
This extends the previously identified curse of memory in linear RNNs into the general nonlinear setting.
- Score: 28.840757822712195
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We prove an inverse approximation theorem for the approximation of nonlinear
sequence-to-sequence relationships using recurrent neural networks (RNNs). This
is a so-called Bernstein-type result in approximation theory, which deduces
properties of a target function under the assumption that it can be effectively
approximated by a hypothesis space. In particular, we show that nonlinear
sequence relationships that can be stably approximated by nonlinear RNNs must
have an exponential decaying memory structure - a notion that can be made
precise. This extends the previously identified curse of memory in linear RNNs
into the general nonlinear setting, and quantifies the essential limitations of
the RNN architecture for learning sequential relationships with long-term
memory. Based on the analysis, we propose a principled reparameterization
method to overcome the limitations. Our theoretical results are confirmed by
numerical experiments. The code has been released in
https://github.com/radarFudan/Curse-of-memory
Related papers
- Approximation Bounds for Recurrent Neural Networks with Application to Regression [7.723218675113336]
We study the approximation capacity of deep ReLU recurrent neural networks (RNNs) and explore the convergence properties of nonparametric least squares regression using RNNs.
We derive upper bounds on the approximation error of RNNs for H"older smooth functions.
Our results provide statistical guarantees on the performance of RNNs.
arXiv Detail & Related papers (2024-09-09T13:02:50Z) - Metric-Entropy Limits on Nonlinear Dynamical System Learning [4.069144210024563]
We show that recurrent neural networks (RNNs) are capable of learning nonlinear systems that satisfy a Lipschitz property and forget past inputs fast enough in a metric-entropy optimal manner.
As the sets of sequence-to-sequence maps we consider are significantly more massive than function classes generally considered in deep neural network approximation theory, a refined metric-entropy characterization is needed.
arXiv Detail & Related papers (2024-07-01T12:57:03Z) - Matrix Completion via Nonsmooth Regularization of Fully Connected Neural Networks [7.349727826230864]
It has been shown that enhanced performance could be attained by using nonlinear estimators such as deep neural networks.
In this paper, we control over-fitting by regularizing FCNN model in terms of norm intermediate representations.
Our simulations indicate the superiority of the proposed algorithm in comparison with existing linear and nonlinear algorithms.
arXiv Detail & Related papers (2024-03-15T12:00:37Z) - Stable Nonconvex-Nonconcave Training via Linear Interpolation [51.668052890249726]
This paper presents a theoretical analysis of linearahead as a principled method for stabilizing (large-scale) neural network training.
We argue that instabilities in the optimization process are often caused by the nonmonotonicity of the loss landscape and show how linear can help by leveraging the theory of nonexpansive operators.
arXiv Detail & Related papers (2023-10-20T12:45:12Z) - Benign Overfitting in Deep Neural Networks under Lazy Training [72.28294823115502]
We show that when the data distribution is well-separated, DNNs can achieve Bayes-optimal test error for classification.
Our results indicate that interpolating with smoother functions leads to better generalization.
arXiv Detail & Related papers (2023-05-30T19:37:44Z) - Kernel-based off-policy estimation without overlap: Instance optimality
beyond semiparametric efficiency [53.90687548731265]
We study optimal procedures for estimating a linear functional based on observational data.
For any convex and symmetric function class $mathcalF$, we derive a non-asymptotic local minimax bound on the mean-squared error.
arXiv Detail & Related papers (2023-01-16T02:57:37Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Implicit Bias of Linear RNNs [27.41989861342218]
Linear recurrent neural networks (RNNs) do not perform well on tasks requiring long-term memory.
This paper provides a rigorous explanation of this property in the special case of linear RNNs.
Using recently-developed kernel regime analysis, our main result shows that linear RNNs are functionally equivalent to a certain weighted 1D-convolutional network.
arXiv Detail & Related papers (2021-01-19T19:39:28Z) - On the Curse of Memory in Recurrent Neural Networks: Approximation and Optimization Analysis [30.75240284934018]
We consider the simple but representative setting of using continuous-time linear RNNs to learn from data generated by linear relationships.
We prove a universal approximation theorem of such linear functionals, and characterize the approximation rate and its relation with memory.
A unifying theme uncovered is the non-trivial effect of memory, a notion that can be made precise in our framework.
arXiv Detail & Related papers (2020-09-16T16:48:28Z) - Improving predictions of Bayesian neural nets via local linearization [79.21517734364093]
We argue that the Gauss-Newton approximation should be understood as a local linearization of the underlying Bayesian neural network (BNN)
Because we use this linearized model for posterior inference, we should also predict using this modified model instead of the original one.
We refer to this modified predictive as "GLM predictive" and show that it effectively resolves common underfitting problems of the Laplace approximation.
arXiv Detail & Related papers (2020-08-19T12:35:55Z) - Provably Efficient Neural Estimation of Structural Equation Model: An
Adversarial Approach [144.21892195917758]
We study estimation in a class of generalized Structural equation models (SEMs)
We formulate the linear operator equation as a min-max game, where both players are parameterized by neural networks (NNs), and learn the parameters of these neural networks using a gradient descent.
For the first time we provide a tractable estimation procedure for SEMs based on NNs with provable convergence and without the need for sample splitting.
arXiv Detail & Related papers (2020-07-02T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.