Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences
- URL: http://arxiv.org/abs/2308.14555v2
- Date: Wed, 15 May 2024 15:21:17 GMT
- Title: Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences
- Authors: Samuel Chun-Hei Lam, Justin Sirignano, Konstantinos Spiliopoulos,
- Abstract summary: We characterize the tangents of recurrent neural networks (RNNs) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity.
These methods give rise to the neural kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Mathematical methods are developed to characterize the asymptotics of recurrent neural networks (RNN) as the number of hidden units, data samples in the sequence, hidden state updates, and training steps simultaneously grow to infinity. In the case of an RNN with a simplified weight matrix, we prove the convergence of the RNN to the solution of an infinite-dimensional ODE coupled with the fixed point of a random algebraic equation. The analysis requires addressing several challenges which are unique to RNNs. In typical mean-field applications (e.g., feedforward neural networks), discrete updates are of magnitude $\mathcal{O}(\frac{1}{N})$ and the number of updates is $\mathcal{O}(N)$. Therefore, the system can be represented as an Euler approximation of an appropriate ODE/PDE, which it will converge to as $N \rightarrow \infty$. However, the RNN hidden layer updates are $\mathcal{O}(1)$. Therefore, RNNs cannot be represented as a discretization of an ODE/PDE and standard mean-field techniques cannot be applied. Instead, we develop a fixed point analysis for the evolution of the RNN memory states, with convergence estimates in terms of the number of update steps and the number of hidden units. The RNN hidden layer is studied as a function in a Sobolev space, whose evolution is governed by the data sequence (a Markov chain), the parameter updates, and its dependence on the RNN hidden layer at the previous time step. Due to the strong correlation between updates, a Poisson equation must be used to bound the fluctuations of the RNN around its limit equation. These mathematical methods give rise to the neural tangent kernel (NTK) limits for RNNs trained on data sequences as the number of data samples and size of the neural network grow to infinity.
Related papers
- Revising the Structure of Recurrent Neural Networks to Eliminate Numerical Derivatives in Forming Physics Informed Loss Terms with Respect to Time [0.0]
Mutual Interval RNN (MI-RNN) is applied to solve three different benchmarks: the Burgers equation, unsteady heat conduction in an irregular domain, and the Green vortex problem.
Our results demonstrate that MI-RNN can find the exact solution more accurately compared to existing RNN models.
arXiv Detail & Related papers (2024-09-16T15:24:25Z) - Approximation Bounds for Recurrent Neural Networks with Application to Regression [7.723218675113336]
We study the approximation capacity of deep ReLU recurrent neural networks (RNNs) and explore the convergence properties of nonparametric least squares regression using RNNs.
We derive upper bounds on the approximation error of RNNs for H"older smooth functions.
Our results provide statistical guarantees on the performance of RNNs.
arXiv Detail & Related papers (2024-09-09T13:02:50Z) - On Generalization Bounds for Deep Compound Gaussian Neural Networks [1.4425878137951238]
Unrolled deep neural networks (DNNs) provide better interpretability and superior empirical performance than standard DNNs.
We develop novel generalization error bounds for a class of unrolled DNNs informed by a compound Gaussian prior.
Under realistic conditions, we show that, at worst, the generalization error scales $mathcalO(nsqrt(n))$ in the signal dimension and $mathcalO(($Network Size$)3/2)$ in network size.
arXiv Detail & Related papers (2024-02-20T16:01:39Z) - Neural Networks for Singular Perturbations [0.0]
We prove expressivity rate bounds for solution sets of a model class of singularly perturbed, elliptic two-point boundary value problems.
We establish expression rate bounds in Sobolev norms in terms of the NN size.
arXiv Detail & Related papers (2024-01-12T16:02:18Z) - On the Computational Complexity and Formal Hierarchy of Second Order
Recurrent Neural Networks [59.85314067235965]
We extend the theoretical foundation for the $2nd$-order recurrent network ($2nd$ RNN)
We prove there exists a class of a $2nd$ RNN that is Turing-complete with bounded time.
We also demonstrate that $2$nd order RNNs, without memory, outperform modern-day models such as vanilla RNNs and gated recurrent units in recognizing regular grammars.
arXiv Detail & Related papers (2023-09-26T06:06:47Z) - Learning Low Dimensional State Spaces with Overparameterized Recurrent
Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory.
Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z) - Training Feedback Spiking Neural Networks by Implicit Differentiation on
the Equilibrium State [66.2457134675891]
Spiking neural networks (SNNs) are brain-inspired models that enable energy-efficient implementation on neuromorphic hardware.
Most existing methods imitate the backpropagation framework and feedforward architectures for artificial neural networks.
We propose a novel training method that does not rely on the exact reverse of the forward computation.
arXiv Detail & Related papers (2021-09-29T07:46:54Z) - Fully Spiking Variational Autoencoder [66.58310094608002]
Spiking neural networks (SNNs) can be run on neuromorphic devices with ultra-high speed and ultra-low energy consumption.
In this study, we build a variational autoencoder (VAE) with SNN to enable image generation.
arXiv Detail & Related papers (2021-09-26T06:10:14Z) - Framing RNN as a kernel method: A neural ODE approach [11.374487003189468]
We show that the solution of a RNN can be viewed as a linear function of a specific feature set of the input sequence, known as the signature.
We obtain theoretical guarantees on generalization and stability for a large class of recurrent networks.
arXiv Detail & Related papers (2021-06-02T14:46:40Z) - Online Limited Memory Neural-Linear Bandits with Likelihood Matching [53.18698496031658]
We study neural-linear bandits for solving problems where both exploration and representation learning play an important role.
We propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online.
arXiv Detail & Related papers (2021-02-07T14:19:07Z) - Modeling from Features: a Mean-field Framework for Over-parameterized
Deep Neural Networks [54.27962244835622]
This paper proposes a new mean-field framework for over- parameterized deep neural networks (DNNs)
In this framework, a DNN is represented by probability measures and functions over its features in the continuous limit.
We illustrate the framework via the standard DNN and the Residual Network (Res-Net) architectures.
arXiv Detail & Related papers (2020-07-03T01:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.