A Practical Sparse Approximation for Real Time Recurrent Learning
- URL: http://arxiv.org/abs/2006.07232v1
- Date: Fri, 12 Jun 2020 14:38:15 GMT
- Title: A Practical Sparse Approximation for Real Time Recurrent Learning
- Authors: Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan,
Alex Graves
- Abstract summary: Real Time Recurrent Learning (RTRL) eliminates the need for history storage and allows for online weight updates.
We introduce the Sparse n-step Approximation (SnAp) to the RTRL influence matrix, which only keeps entries that are nonzero within n steps of the recurrent core.
For highly sparse networks, SnAp with n=2 remains tractable and can outperform backpropagation through time in terms of learning speed when updates are done online.
- Score: 38.19296522866088
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Current methods for training recurrent neural networks are based on
backpropagation through time, which requires storing a complete history of
network states, and prohibits updating the weights `online' (after every
timestep). Real Time Recurrent Learning (RTRL) eliminates the need for history
storage and allows for online weight updates, but does so at the expense of
computational costs that are quartic in the state size. This renders RTRL
training intractable for all but the smallest networks, even ones that are made
highly sparse.
We introduce the Sparse n-step Approximation (SnAp) to the RTRL influence
matrix, which only keeps entries that are nonzero within n steps of the
recurrent core. SnAp with n=1 is no more expensive than backpropagation, and we
find that it substantially outperforms other RTRL approximations with
comparable costs such as Unbiased Online Recurrent Optimization. For highly
sparse networks, SnAp with n=2 remains tractable and can outperform
backpropagation through time in terms of learning speed when updates are done
online. SnAp becomes equivalent to RTRL when n is large.
Related papers
- Real-Time Recurrent Learning using Trace Units in Reinforcement Learning [27.250024431890477]
Recurrent Neural Networks (RNNs) are used to learn representations in partially observable environments.
For agents that learn online and continually interact with the environment, it is desirable to train RNNs with real-time recurrent learning (RTRL)
We build on these insights to provide a lightweight but effective approach for training RNNs in online RL.
arXiv Detail & Related papers (2024-09-02T20:08:23Z) - Exploiting Symmetric Temporally Sparse BPTT for Efficient RNN Training [20.49255973077044]
This work describes a training algorithm for Delta RNNs that exploits temporal sparsity in the backward propagation phase to reduce computational requirements for training on the edge.
Results show a reduction of $sim$80% in matrix operations for training a 56k parameter Delta LSTM on the Fluent Speech Commands dataset with negligible accuracy loss.
We show that the proposed Delta RNN training will be useful for online incremental learning on edge devices with limited computing resources.
arXiv Detail & Related papers (2023-12-14T23:07:37Z) - Efficient Real Time Recurrent Learning through combined activity and
parameter sparsity [0.5076419064097732]
Backpropagation through time (BPTT) is the standard algorithm for training recurrent neural networks (RNNs)
BPTT is unsuited for online learning and presents a challenge for implementation on low-resource real-time systems.
We show that recurrent networks exhibiting high activity sparsity can reduce the computational cost of Real-Time Recurrent Learning (RTRL)
arXiv Detail & Related papers (2023-03-10T01:09:04Z) - Towards Memory- and Time-Efficient Backpropagation for Training Spiking
Neural Networks [70.75043144299168]
Spiking Neural Networks (SNNs) are promising energy-efficient models for neuromorphic computing.
We propose the Spatial Learning Through Time (SLTT) method that can achieve high performance while greatly improving training efficiency.
Our method achieves state-of-the-art accuracy on ImageNet, while the memory cost and training time are reduced by more than 70% and 50%, respectively, compared with BPTT.
arXiv Detail & Related papers (2023-02-28T05:01:01Z) - Scalable Real-Time Recurrent Learning Using Columnar-Constructive
Networks [19.248060562241296]
We propose two constraints that make real-time recurrent learning scalable.
We show that by either decomposing the network into independent modules or learning the network in stages, we can make RTRL scale linearly with the number of parameters.
We demonstrate the effectiveness of our approach over Truncated-BPTT on a prediction benchmark inspired by animal learning and by doing policy evaluation of pre-trained policies for Atari 2600 games.
arXiv Detail & Related papers (2023-01-20T23:17:48Z) - Intelligence Processing Units Accelerate Neuromorphic Learning [52.952192990802345]
Spiking neural networks (SNNs) have achieved orders of magnitude improvement in terms of energy consumption and latency.
We present an IPU-optimized release of our custom SNN Python package, snnTorch.
arXiv Detail & Related papers (2022-11-19T15:44:08Z) - Online Training Through Time for Spiking Neural Networks [66.7744060103562]
Spiking neural networks (SNNs) are promising brain-inspired energy-efficient models.
Recent progress in training methods has enabled successful deep SNNs on large-scale tasks with low latency.
We propose online training through time (OTTT) for SNNs, which is derived from BPTT to enable forward-in-time learning.
arXiv Detail & Related papers (2022-10-09T07:47:56Z) - Single-Shot Pruning for Offline Reinforcement Learning [47.886329599997474]
Deep Reinforcement Learning (RL) is a powerful framework for solving complex real-world problems.
One way to tackle this problem is to prune neural networks leaving only the necessary parameters.
We close the gap between RL and single-shot pruning techniques and present a general pruning approach to the Offline RL.
arXiv Detail & Related papers (2021-12-31T18:10:02Z) - Continual Learning in Recurrent Neural Networks [67.05499844830231]
We evaluate the effectiveness of continual learning methods for processing sequential data with recurrent neural networks (RNNs)
We shed light on the particularities that arise when applying weight-importance methods, such as elastic weight consolidation, to RNNs.
We show that the performance of weight-importance methods is not directly affected by the length of the processed sequences, but rather by high working memory requirements.
arXiv Detail & Related papers (2020-06-22T10:05:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.