All You Need Is Supervised Learning: From Imitation Learning to Meta-RL
With Upside Down RL
- URL: http://arxiv.org/abs/2202.11960v1
- Date: Thu, 24 Feb 2022 08:44:11 GMT
- Title: All You Need Is Supervised Learning: From Imitation Learning to Meta-RL
With Upside Down RL
- Authors: Kai Arulkumaran, Dylan R. Ashley, J\"urgen Schmidhuber, Rupesh K.
Srivastava
- Abstract summary: Upside down reinforcement learning (UDRL) flips the conventional use of the return in the objective function in RL upside down.
UDRL is based purely on supervised learning, and bypasses some prominent issues in RL: bootstrapping, off-policy corrections, and discount factors.
- Score: 0.5735035463793008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Upside down reinforcement learning (UDRL) flips the conventional use of the
return in the objective function in RL upside down, by taking returns as input
and predicting actions. UDRL is based purely on supervised learning, and
bypasses some prominent issues in RL: bootstrapping, off-policy corrections,
and discount factors. While previous work with UDRL demonstrated it in a
traditional online RL setting, here we show that this single algorithm can also
work in the imitation learning and offline RL settings, be extended to the
goal-conditioned RL setting, and even the meta-RL setting. With a general agent
architecture, a single UDRL agent can learn across all paradigms.
Related papers
- Unsupervised-to-Online Reinforcement Learning [59.910638327123394]
Unsupervised-to-online RL (U2O RL) replaces domain-specific supervised offline RL with unsupervised offline RL.
U2O RL not only enables reusing a single pre-trained model for multiple downstream tasks, but also learns better representations.
We empirically demonstrate that U2O RL achieves strong performance that matches or even outperforms previous offline-to-online RL approaches.
arXiv Detail & Related papers (2024-08-27T05:23:45Z) - Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL.
We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z) - Leveraging Reward Consistency for Interpretable Feature Discovery in
Reinforcement Learning [69.19840497497503]
It is argued that the commonly used action matching principle is more like an explanation of deep neural networks (DNNs) than the interpretation of RL agents.
We propose to consider rewards, the essential objective of RL agents, as the essential objective of interpreting RL agents.
We verify and evaluate our method on the Atari 2600 games as well as Duckietown, a challenging self-driving car simulator environment.
arXiv Detail & Related papers (2023-09-04T09:09:54Z) - SRL: Scaling Distributed Reinforcement Learning to Over Ten Thousand Cores [13.948640763797776]
We present a novel abstraction on the dataflows of RL training, which unifies diverse RL training applications into a general framework.
We develop a scalable, efficient, and distributed RL system called ReaLly scalableRL, which allows efficient and massively parallelized training.
SRL is the first in the academic community to perform RL experiments at a large scale with over 15k CPU cores.
arXiv Detail & Related papers (2023-06-29T05:16:25Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Beyond Tabula Rasa: Reincarnating Reinforcement Learning [37.201451908129386]
Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research.
We present reincarnating RL as an alternative workflow, where prior computational work is reused or transferred between design iterations of an RL agent.
We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations.
arXiv Detail & Related papers (2022-06-03T15:11:10Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z) - EasyRL: A Simple and Extensible Reinforcement Learning Framework [3.2173369911280023]
EasyRL provides an interactive graphical user interface for users to train and evaluate RL agents.
EasyRL does not require programming knowledge for training and testing simple built-in RL agents.
EasyRL also supports custom RL agents and environments, which can be highly beneficial for RL researchers in evaluating and comparing their RL models.
arXiv Detail & Related papers (2020-08-04T17:02:56Z) - Learning to Prune Deep Neural Networks via Reinforcement Learning [64.85939668308966]
PuRL is a deep reinforcement learning based algorithm for pruning neural networks.
It achieves sparsity and accuracy comparable to current state-of-the-art methods.
arXiv Detail & Related papers (2020-07-09T13:06:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.