Related papers: Convergence Analysis of Real-time Recurrent Learning (RTRL) for a class of Recurrent Neural Networks

Convergence Analysis of Real-time Recurrent Learning (RTRL) for a class of Recurrent Neural Networks

URL: http://arxiv.org/abs/2501.08040v1
Date: Tue, 14 Jan 2025 11:46:36 GMT
Title: Convergence Analysis of Real-time Recurrent Learning (RTRL) for a class of Recurrent Neural Networks
Authors: Samuel Chun-Hei Lam, Justin Sirignano, Konstantinos Spiliopoulos,
Abstract summary: Real-time recurrent learning (RTRL) is an online optimization algorithm.<n>It follows the true gradient loss on the data sequence as the number of sequence time steps.<n>One potential application area for RTRL is the analysis of financial data.
Score: 0.0
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recurrent neural networks (RNNs) are commonly trained with the truncated backpropagation-through-time (TBPTT) algorithm. For the purposes of computational tractability, the TBPTT algorithm truncates the chain rule and calculates the gradient on a finite block of the overall data sequence. Such approximation could lead to significant inaccuracies, as the block length for the truncated backpropagation is typically limited to be much smaller than the overall sequence length. In contrast, Real-time recurrent learning (RTRL) is an online optimization algorithm which asymptotically follows the true gradient of the loss on the data sequence as the number of sequence time steps $t \rightarrow \infty$. RTRL forward propagates the derivatives of the RNN hidden/memory units with respect to the parameters and, using the forward derivatives, performs online updates of the parameters at each time step in the data sequence. RTRL's online forward propagation allows for exact optimization over extremely long data sequences, although it can be computationally costly for models with large numbers of parameters. We prove convergence of the RTRL algorithm for a class of RNNs. The convergence analysis establishes a fixed point for the joint distribution of the data sequence, RNN hidden layer, and the RNN hidden layer forward derivatives as the number of data samples from the sequence and the number of training steps tend to infinity. We prove convergence of the RTRL algorithm to a stationary point of the loss. Numerical studies illustrate our theoretical results. One potential application area for RTRL is the analysis of financial data, which typically involve long time series and models with small to medium numbers of parameters. This makes RTRL computationally tractable and a potentially appealing optimization method for training models. Thus, we include an example of RTRL applied to limit order book data.

Related papers

Approximation Bounds for Recurrent Neural Networks with Application to Regression [7.723218675113336]
We study the approximation capacity of deep ReLU recurrent neural networks (RNNs) and explore the convergence properties of nonparametric least squares regression using RNNs. We derive upper bounds on the approximation error of RNNs for H"older smooth functions. Our results provide statistical guarantees on the performance of RNNs.
arXiv Detail & Related papers (2024-09-09T13:02:50Z)
Efficient Real Time Recurrent Learning through combined activity and parameter sparsity [0.5076419064097732]
Backpropagation through time (BPTT) is the standard algorithm for training recurrent neural networks (RNNs) BPTT is unsuited for online learning and presents a challenge for implementation on low-resource real-time systems. We show that recurrent networks exhibiting high activity sparsity can reduce the computational cost of Real-Time Recurrent Learning (RTRL)
arXiv Detail & Related papers (2023-03-10T01:09:04Z)
Scalable Real-Time Recurrent Learning Using Columnar-Constructive Networks [19.248060562241296]
We propose two constraints that make real-time recurrent learning scalable. We show that by either decomposing the network into independent modules or learning the network in stages, we can make RTRL scale linearly with the number of parameters. We demonstrate the effectiveness of our approach over Truncated-BPTT on a prediction benchmark inspired by animal learning and by doing policy evaluation of pre-trained policies for Atari 2600 games.
arXiv Detail & Related papers (2023-01-20T23:17:48Z)
Learning Low Dimensional State Spaces with Overparameterized Recurrent Neural Nets [57.06026574261203]
We provide theoretical evidence for learning low-dimensional state spaces, which can also model long-term memory. Experiments corroborate our theory, demonstrating extrapolation via learning low-dimensional state spaces with both linear and non-linear RNNs.
arXiv Detail & Related papers (2022-10-25T14:45:15Z)
Efficient Dataset Distillation Using Random Feature Approximation [109.07737733329019]
We propose a novel algorithm that uses a random feature approximation (RFA) of the Neural Network Gaussian Process (NNGP) kernel. Our algorithm provides at least a 100-fold speedup over KIP and can run on a single GPU. Our new method, termed an RFA Distillation (RFAD), performs competitively with KIP and other dataset condensation algorithms in accuracy over a range of large-scale datasets.
arXiv Detail & Related papers (2022-10-21T15:56:13Z)
SreaMRAK a Streaming Multi-Resolution Adaptive Kernel Algorithm [60.61943386819384]
Existing implementations of KRR require that all the data is stored in the main memory. We propose StreaMRAK - a streaming version of KRR. We present a showcase study on two synthetic problems and the prediction of the trajectory of a double pendulum.
arXiv Detail & Related papers (2021-08-23T21:03:09Z)
Deep Cellular Recurrent Network for Efficient Analysis of Time-Series Data with Spatial Information [52.635997570873194]
This work proposes a novel deep cellular recurrent neural network (DCRNN) architecture to process complex multi-dimensional time series data with spatial information. The proposed architecture achieves state-of-the-art performance while utilizing substantially less trainable parameters when compared to comparable methods in the literature.
arXiv Detail & Related papers (2021-01-12T20:08:18Z)
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks [56.084798078072396]
We take a step towards closing the gap between theory and practice by significantly improving the known theoretical bounds on both the network width and the convergence time. We show that convergence to a global minimum is guaranteed for networks with quadratic widths in the sample size and linear in their depth at a time logarithmic in both. Our analysis and convergence bounds are derived via the construction of a surrogate network with fixed activation patterns that can be transformed at any time to an equivalent ReLU network of a reasonable size.
arXiv Detail & Related papers (2021-01-12T00:40:45Z)
Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs) We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs. We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z)
A Practical Sparse Approximation for Real Time Recurrent Learning [38.19296522866088]
Real Time Recurrent Learning (RTRL) eliminates the need for history storage and allows for online weight updates. We introduce the Sparse n-step Approximation (SnAp) to the RTRL influence matrix, which only keeps entries that are nonzero within n steps of the recurrent core. For highly sparse networks, SnAp with n=2 remains tractable and can outperform backpropagation through time in terms of learning speed when updates are done online.
arXiv Detail & Related papers (2020-06-12T14:38:15Z)
Achieving Online Regression Performance of LSTMs with Simple RNNs [0.0]
We introduce a first-order training algorithm with a linear time complexity in the number of parameters. We show that when SRNNs are trained with our algorithm, they provide very similar regression performance with the LSTMs in two to three times shorter training time.
arXiv Detail & Related papers (2020-05-16T11:41:13Z)

This list is automatically generated from the titles and abstracts of the papers in this site.