Related papers: A Taxonomy of Recurrent Learning Rules

A Taxonomy of Recurrent Learning Rules

URL: http://arxiv.org/abs/2207.11439v2
Date: Tue, 08 Oct 2024 15:29:00 GMT
Title: A Taxonomy of Recurrent Learning Rules
Authors: Guillermo Martín-Sánchez, Sander Bohté, Sebastian Otte,
Abstract summary: Backpropagation through time (BPTT) is the de facto standard for training recurrent neural networks (RNNs) E-prop was proposed as a causal, local, and efficient practical alternative to these algorithms. We derive RTRL from BPTT using a detailed notation bringing intuition and clarification to how they are connected.
Score: 1.4186974630564675
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Backpropagation through time (BPTT) is the de facto standard for training recurrent neural networks (RNNs), but it is non-causal and non-local. Real-time recurrent learning is a causal alternative, but it is highly inefficient. Recently, e-prop was proposed as a causal, local, and efficient practical alternative to these algorithms, providing an approximation of the exact gradient by radically pruning the recurrent dependencies carried over time. Here, we derive RTRL from BPTT using a detailed notation bringing intuition and clarification to how they are connected. Furthermore, we frame e-prop within in the picture, formalising what it approximates. Finally, we derive a family of algorithms of which e-prop is a special case.

Related papers

Can Local Representation Alignment RNNs Solve Temporal Tasks? [1.1085024199293136]
Recurrent Neural Networks (RNNs) are commonly used for real-time processing, streaming data, and cases where the amount of training samples is limited. BPTT is the predominant algorithm for training RNNs, but it is frequently criticized for being prone to exploding and vanishing gradients. We present and evaluate a target propagation-based method for RNNs, which uses local updates and seeks to reduce the said instabilities.
arXiv Detail & Related papers (2025-04-18T07:48:48Z)
Fast Training of Recurrent Neural Networks with Stationary State Feedbacks [48.22082789438538]
Recurrent neural networks (RNNs) have recently demonstrated strong performance and faster inference than Transformers. We propose a novel method that replaces BPTT with a fixed gradient feedback mechanism.
arXiv Detail & Related papers (2025-03-29T14:45:52Z)
The Cascaded Forward Algorithm for Neural Network Training [61.06444586991505]
We propose a new learning framework for neural networks, namely Cascaded Forward (CaFo) algorithm, which does not rely on BP optimization as that in FF. Unlike FF, our framework directly outputs label distributions at each cascaded block, which does not require generation of additional negative samples. In our framework each block can be trained independently, so it can be easily deployed into parallel acceleration systems.
arXiv Detail & Related papers (2023-03-17T02:01:11Z)
Towards Memory- and Time-Efficient Backpropagation for Training Spiking Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing. We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency. Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z)
Efficient LSTM Training with Eligibility Traces [0.5801044612920815]
Training recurrent neural networks is predominantly achieved via backpropagation through time (BPTT) A more efficient and biologically plausible alternative for BPTT is e-prop. We show that e-prop is a suitable optimization algorithm for LSTMs by comparing it to BPTT on two benchmarks for supervised learning.
arXiv Detail & Related papers (2022-09-30T14:47:04Z)
Improving the Efficiency of Off-Policy Reinforcement Learning by Accounting for Past Decisions [20.531576904743282]
Off-policy estimation bias is corrected in a per-decision manner. Off-policy algorithms such as Tree Backup and Retrace rely on this mechanism. We propose a multistep operator that permits arbitrary past-dependent traces.
arXiv Detail & Related papers (2021-12-23T00:07:28Z)
Backpropagation Through Time For Networks With Long-Term Dependencies [0.0]
Backpropagation through time (BPTT) is a technique of updating tuned parameters within recurrent neural networks (RNNs) We propose using the 'discrete forward sensitivity equation' and a variant of it for single and multiple interacting recurrent loops respectively. This solution is exact and also allows the network's parameters to vary between each subsequent step, however it does require the computation of a Jacobian.
arXiv Detail & Related papers (2021-03-26T15:55:54Z)
Neural Pruning via Growing Regularization [82.9322109208353]
We extend regularization to tackle two central problems of pruning: pruning schedule and weight importance scoring. Specifically, we propose an L2 regularization variant with rising penalty factors and show it can bring significant accuracy gains. The proposed algorithms are easy to implement and scalable to large datasets and networks in both structured and unstructured pruning.
arXiv Detail & Related papers (2020-12-16T20:16:28Z)
Temporal Surrogate Back-propagation for Spiking Neural Networks [2.291640606078406]
Spiking neural networks (SNN) are usually more energy-efficient as compared to Artificial neural networks (ANN) Back-propagation (BP) has shown its strong power in training ANN in recent years. However, since spike behavior is non-differentiable, BP cannot be applied to SNN directly.
arXiv Detail & Related papers (2020-11-18T08:22:47Z)
Activation Relaxation: A Local Dynamical Approximation to Backpropagation in the Brain [62.997667081978825]
Activation Relaxation (AR) is motivated by constructing the backpropagation gradient as the equilibrium point of a dynamical system. Our algorithm converges rapidly and robustly to the correct backpropagation gradients, requires only a single type of computational unit, and can operate on arbitrary computation graphs.
arXiv Detail & Related papers (2020-09-11T11:56:34Z)
AdaS: Adaptive Scheduling of Stochastic Gradients [50.80697760166045]
We introduce the notions of textit"knowledge gain" and textit"mapping condition" and propose a new algorithm called Adaptive Scheduling (AdaS) Experimentation reveals that, using the derived metrics, AdaS exhibits: (a) faster convergence and superior generalization over existing adaptive learning methods; and (b) lack of dependence on a validation set to determine when to stop training.
arXiv Detail & Related papers (2020-06-11T16:36:31Z)
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs [68.8204255655161]
We develop a strategy to translate core machine learning architectures into their predictive coding equivalents. Our models perform equivalently to backprop on challenging machine learning benchmarks. Our method raises the potential that standard machine learning algorithms could in principle be directly implemented in neural circuitry.
arXiv Detail & Related papers (2020-06-07T15:35:47Z)

This list is automatically generated from the titles and abstracts of the papers in this site.