All You Need Is Supervised Learning: From Imitation Learning to Meta-RL
With Upside Down RL
- URL: http://arxiv.org/abs/2202.11960v1
- Date: Thu, 24 Feb 2022 08:44:11 GMT
- Title: All You Need Is Supervised Learning: From Imitation Learning to Meta-RL
With Upside Down RL
- Authors: Kai Arulkumaran, Dylan R. Ashley, J\"urgen Schmidhuber, Rupesh K.
Srivastava
- Abstract summary: Upside down reinforcement learning (UDRL) flips the conventional use of the return in the objective function in RL upside down.
UDRL is based purely on supervised learning, and bypasses some prominent issues in RL: bootstrapping, off-policy corrections, and discount factors.
- Score: 0.5735035463793008
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Upside down reinforcement learning (UDRL) flips the conventional use of the
return in the objective function in RL upside down, by taking returns as input
and predicting actions. UDRL is based purely on supervised learning, and
bypasses some prominent issues in RL: bootstrapping, off-policy corrections,
and discount factors. While previous work with UDRL demonstrated it in a
traditional online RL setting, here we show that this single algorithm can also
work in the imitation learning and offline RL settings, be extended to the
goal-conditioned RL setting, and even the meta-RL setting. With a general agent
architecture, a single UDRL agent can learn across all paradigms.
Related papers
- SFT Memorizes, RL Generalizes: A Comparative Study of Foundation Model Post-training [127.47044960572659]
Supervised fine-tuning (SFT) and reinforcement learning (RL) are widely used post-training techniques for foundation models.
This paper studies the difference between SFT and RL on generalization and memorization.
We show that RL, especially when trained with an outcome-based reward, generalizes across both rule-based textual and visual variants.
arXiv Detail & Related papers (2025-01-28T18:59:44Z) - RLInspect: An Interactive Visual Approach to Assess Reinforcement Learning Algorithm [0.0]
Reinforcement Learning (RL) is a rapidly growing area of machine learning.
Assessing RL models can be challenging, which makes it difficult to interpret their behaviour.
We have developed RLInspect, an interactive visual analytic tool.
It takes into account different components of the RL model - state, action, agent architecture and reward, and provides a more comprehensive view of the RL training.
arXiv Detail & Related papers (2024-11-13T07:24:14Z) - Unsupervised-to-Online Reinforcement Learning [59.910638327123394]
Unsupervised-to-online RL (U2O RL) replaces domain-specific supervised offline RL with unsupervised offline RL.
U2O RL not only enables reusing a single pre-trained model for multiple downstream tasks, but also learns better representations.
We empirically demonstrate that U2O RL achieves strong performance that matches or even outperforms previous offline-to-online RL approaches.
arXiv Detail & Related papers (2024-08-27T05:23:45Z) - Is Value Learning Really the Main Bottleneck in Offline RL? [70.54708989409409]
We show that the choice of a policy extraction algorithm significantly affects the performance and scalability of offline RL.
We propose two simple test-time policy improvement methods and show that these methods lead to better performance.
arXiv Detail & Related papers (2024-06-13T17:07:49Z) - RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task via traditional RL, in the inputs to meta-RL.
We show that RL$3$ earns greater cumulative reward in the long term compared to RL$2$ while drastically reducing meta-training time and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Beyond Tabula Rasa: Reincarnating Reinforcement Learning [37.201451908129386]
Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research.
We present reincarnating RL as an alternative workflow, where prior computational work is reused or transferred between design iterations of an RL agent.
We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations.
arXiv Detail & Related papers (2022-06-03T15:11:10Z) - RL-DARTS: Differentiable Architecture Search for Reinforcement Learning [62.95469460505922]
We introduce RL-DARTS, one of the first applications of Differentiable Architecture Search (DARTS) in reinforcement learning (RL)
By replacing the image encoder with a DARTS supernet, our search method is sample-efficient, requires minimal extra compute resources, and is also compatible with off-policy and on-policy RL algorithms, needing only minor changes in preexisting code.
We show that the supernet gradually learns better cells, leading to alternative architectures which can be highly competitive against manually designed policies, but also verify previous design choices for RL policies.
arXiv Detail & Related papers (2021-06-04T03:08:43Z) - EasyRL: A Simple and Extensible Reinforcement Learning Framework [3.2173369911280023]
EasyRL provides an interactive graphical user interface for users to train and evaluate RL agents.
EasyRL does not require programming knowledge for training and testing simple built-in RL agents.
EasyRL also supports custom RL agents and environments, which can be highly beneficial for RL researchers in evaluating and comparing their RL models.
arXiv Detail & Related papers (2020-08-04T17:02:56Z) - Learning to Prune Deep Neural Networks via Reinforcement Learning [64.85939668308966]
PuRL is a deep reinforcement learning based algorithm for pruning neural networks.
It achieves sparsity and accuracy comparable to current state-of-the-art methods.
arXiv Detail & Related papers (2020-07-09T13:06:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.