Exploring the Promise and Limits of Real-Time Recurrent Learning
- URL: http://arxiv.org/abs/2305.19044v3
- Date: Wed, 28 Feb 2024 16:40:38 GMT
- Title: Exploring the Promise and Limits of Real-Time Recurrent Learning
- Authors: Kazuki Irie, Anand Gopalakrishnan, J\"urgen Schmidhuber
- Abstract summary: Real-time recurrent learning (RTRL) for sequence-processing recurrent neural networks (RNNs) offers certain conceptual advantages over backpropagation through time (BPTT)
We study actor-critic methods that combine RTRL and policy gradients, and test them in several subsets of DMLab-30, ProcGen, and Atari- 2600 environments.
Our system trained on fewer than 1.2 B environmental frames is competitive with or outperforms well-known IMPALA and R2D2 baselines trained on 10 B frames.
- Score: 14.162274619299902
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Real-time recurrent learning (RTRL) for sequence-processing recurrent neural
networks (RNNs) offers certain conceptual advantages over backpropagation
through time (BPTT). RTRL requires neither caching past activations nor
truncating context, and enables online learning. However, RTRL's time and space
complexity make it impractical. To overcome this problem, most recent work on
RTRL focuses on approximation theories, while experiments are often limited to
diagnostic settings. Here we explore the practical promise of RTRL in more
realistic settings. We study actor-critic methods that combine RTRL and policy
gradients, and test them in several subsets of DMLab-30, ProcGen, and
Atari-2600 environments. On DMLab memory tasks, our system trained on fewer
than 1.2 B environmental frames is competitive with or outperforms well-known
IMPALA and R2D2 baselines trained on 10 B frames. To scale to such challenging
tasks, we focus on certain well-known neural architectures with element-wise
recurrence, allowing for tractable RTRL without approximation. Importantly, we
also discuss rarely addressed limitations of RTRL in real-world applications,
such as its complexity in the multi-layer case.
Related papers
- Retrieval-Augmented Decision Transformer: External Memory for In-context RL [20.06696368770274]
We introduce Retrieval-Augmented Decision Transformer (RA-DT)
RA-DT employs an external memory mechanism to store past experiences from which it retrieves only sub-trajectories relevant for the current situation.
We evaluate the capabilities of RA-DT on grid-world environments, robotics simulations, and procedurally-generated video games.
arXiv Detail & Related papers (2024-10-09T17:15:30Z) - Tractable Offline Learning of Regular Decision Processes [50.11277112628193]
This work studies offline Reinforcement Learning (RL) in a class of non-Markovian environments called Regular Decision Processes (RDPs)
Ins, the unknown dependency of future observations and rewards from the past interactions can be captured experimentally.
Many algorithms first reconstruct this unknown dependency using automata learning techniques.
arXiv Detail & Related papers (2024-09-04T14:26:58Z) - Real-Time Recurrent Learning using Trace Units in Reinforcement Learning [27.250024431890477]
Recurrent Neural Networks (RNNs) are used to learn representations in partially observable environments.
For agents that learn online and continually interact with the environment, it is desirable to train RNNs with real-time recurrent learning (RTRL)
We build on these insights to provide a lightweight but effective approach for training RNNs in online RL.
arXiv Detail & Related papers (2024-09-02T20:08:23Z) - Continuous Control with Coarse-to-fine Reinforcement Learning [15.585706638252441]
We present a framework that trains RL agents to zoom-into a continuous action space in a coarse-to-fine manner.
We introduce a concrete, value-based algorithm within the framework called Coarse-to-fine Q-Network (CQN)
CQN robustly learns to solve real-world manipulation tasks within a few minutes of online training.
arXiv Detail & Related papers (2024-07-10T16:04:08Z) - Real-Time Recurrent Reinforcement Learning [7.737685867200335]
RTRRL consists of three parts: (1) a Meta-RL RNN architecture, implementing on its own an actor-critic algorithm; (2) an outer reinforcement learning algorithm, exploiting temporal difference learning and dutch eligibility traces to train the Meta-RL network; and (3) random-feedback local-online (RFLO) learning, an online automatic differentiation algorithm for computing the gradients with respect to parameters of the network.
arXiv Detail & Related papers (2023-11-08T16:56:16Z) - Action-Quantized Offline Reinforcement Learning for Robotic Skill
Learning [68.16998247593209]
offline reinforcement learning (RL) paradigm provides recipe to convert static behavior datasets into policies that can perform better than the policy that collected the data.
In this paper, we propose an adaptive scheme for action quantization.
We show that several state-of-the-art offline RL methods such as IQL, CQL, and BRAC improve in performance on benchmarks when combined with our proposed discretization scheme.
arXiv Detail & Related papers (2023-10-18T06:07:10Z) - FIRE: A Failure-Adaptive Reinforcement Learning Framework for Edge Computing Migrations [52.85536740465277]
FIRE is a framework that adapts to rare events by training a RL policy in an edge computing digital twin environment.
We propose ImRE, an importance sampling-based Q-learning algorithm, which samples rare events proportionally to their impact on the value function.
We show that FIRE reduces costs compared to vanilla RL and the greedy baseline in the event of failures.
arXiv Detail & Related papers (2022-09-28T19:49:39Z) - When does return-conditioned supervised learning work for offline
reinforcement learning? [51.899892382786526]
We study the capabilities and limitations of return-conditioned supervised learning.
We find that RCSL returns the optimal policy under a set of assumptions stronger than those needed for the more traditional dynamic programming-based algorithms.
arXiv Detail & Related papers (2022-06-02T15:05:42Z) - Deep Q-network using reservoir computing with multi-layered readout [0.0]
Recurrent neural network (RNN) based reinforcement learning (RL) is used for learning context-dependent tasks.
An approach with replay memory introducing reservoir computing has been proposed, which trains an agent without BPTT.
This paper shows that the performance of this method improves by using a multi-layered neural network for the readout layer.
arXiv Detail & Related papers (2022-03-03T00:32:55Z) - Single-Shot Pruning for Offline Reinforcement Learning [47.886329599997474]
Deep Reinforcement Learning (RL) is a powerful framework for solving complex real-world problems.
One way to tackle this problem is to prune neural networks leaving only the necessary parameters.
We close the gap between RL and single-shot pruning techniques and present a general pruning approach to the Offline RL.
arXiv Detail & Related papers (2021-12-31T18:10:02Z) - Critic Regularized Regression [70.8487887738354]
We propose a novel offline RL algorithm to learn policies from data using a form of critic-regularized regression (CRR)
We find that CRR performs surprisingly well and scales to tasks with high-dimensional state and action spaces.
arXiv Detail & Related papers (2020-06-26T17:50:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.