Beyond Tabula Rasa: Reincarnating Reinforcement Learning
- URL: http://arxiv.org/abs/2206.01626v1
- Date: Fri, 3 Jun 2022 15:11:10 GMT
- Title: Beyond Tabula Rasa: Reincarnating Reinforcement Learning
- Authors: Rishabh Agarwal, Max Schwarzer, Pablo Samuel Castro, Aaron Courville,
Marc G. Bellemare
- Abstract summary: Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research.
We present reincarnating RL as an alternative workflow, where prior computational work is reused or transferred between design iterations of an RL agent.
We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations.
- Score: 37.201451908129386
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Learning tabula rasa, that is without any prior knowledge, is the prevalent
workflow in reinforcement learning (RL) research. However, RL systems, when
applied to large-scale settings, rarely operate tabula rasa. Such large-scale
systems undergo multiple design or algorithmic changes during their development
cycle and use ad hoc approaches for incorporating these changes without
re-training from scratch, which would have been prohibitively expensive.
Additionally, the inefficiency of deep RL typically excludes researchers
without access to industrial-scale resources from tackling
computationally-demanding problems. To address these issues, we present
reincarnating RL as an alternative workflow, where prior computational work
(e.g., learned policies) is reused or transferred between design iterations of
an RL agent, or from one RL agent to another. As a step towards enabling
reincarnating RL from any agent to any other agent, we focus on the specific
setting of efficiently transferring an existing sub-optimal policy to a
standalone value-based RL agent. We find that existing approaches fail in this
setting and propose a simple algorithm to address their limitations. Equipped
with this algorithm, we demonstrate reincarnating RL's gains over tabula rasa
RL on Atari 2600 games, a challenging locomotion task, and the real-world
problem of navigating stratospheric balloons. Overall, this work argues for an
alternative approach to RL research, which we believe could significantly
improve real-world RL adoption and help democratize it further.
Related papers
- ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL [80.10358123795946]
We develop a framework for building multi-turn RL algorithms for fine-tuning large language models.
Our framework adopts a hierarchical RL approach and runs two RL algorithms in parallel.
Empirically, we find that ArCHer significantly improves efficiency and performance on agent tasks.
arXiv Detail & Related papers (2024-02-29T18:45:56Z) - Hybrid Inverse Reinforcement Learning [34.793570631021005]
inverse reinforcement learning approach to imitation learning is a double-edged sword.
We propose using hybrid RL -- training on a mixture of online and expert data -- to curtail unnecessary exploration.
We derive both model-free and model-based hybrid inverse RL algorithms with strong policy performance guarantees.
arXiv Detail & Related papers (2024-02-13T23:29:09Z) - SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores [13.948640763797776]
We present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework.
We develop a scalable, efficient, and distributed RL system called ReaLly scalableRL, which allows efficient and massively parallelized training.
SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores.
arXiv Detail & Related papers (2023-06-29T05:16:25Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Provable Reset-free Reinforcement Learning by No-Regret Reduction [13.800970428473134]
We propose a generic no-regret reduction to systematically design reset-free RL algorithms.
Our reduction turns the reset-free RL problem into a two-player game.
We show that achieving sublinear regret in this two-player game would imply learning a policy that has both sublinear performance regret and sublinear total number of resets in the original RL problem.
arXiv Detail & Related papers (2023-01-06T05:51:53Z) - Entropy Regularized Reinforcement Learning with Cascading Networks [9.973226671536041]
Deep RL uses neural networks as function approximators.
One of the major difficulties of RL is the absence of i.i.d. data.
In this work, we challenge the common practices of the (un)supervised learning community of using a fixed neural architecture.
arXiv Detail & Related papers (2022-10-16T10:28:59Z) - Automated Reinforcement Learning (AutoRL): A Survey and Open Problems [92.73407630874841]
Automated Reinforcement Learning (AutoRL) involves not only standard applications of AutoML but also includes additional challenges unique to RL.
We provide a common taxonomy, discuss each area in detail and pose open problems which would be of interest to researchers going forward.
arXiv Detail & Related papers (2022-01-11T12:41:43Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z) - Regret Minimization Experience Replay [14.233842517210437]
prioritized sampling is a promising technique to improve the performance of RL agents.
In this work, we analyze the optimal prioritization strategy that can minimize the regret of RL policy theoretically.
We propose two practical algorithms, RM-DisCor and RM-TCE.
arXiv Detail & Related papers (2021-05-15T16:08:45Z) - Rewriting History with Inverse RL: Hindsight Inference for Policy
Improvement [137.29281352505245]
We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks.
Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
arXiv Detail & Related papers (2020-02-25T18:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.