RvS: What is Essential for Offline RL via Supervised Learning?
- URL: http://arxiv.org/abs/2112.10751v1
- Date: Mon, 20 Dec 2021 18:55:16 GMT
- Title: RvS: What is Essential for Offline RL via Supervised Learning?
- Authors: Scott Emmons, Benjamin Eysenbach, Ilya Kostrikov, Sergey Levine
- Abstract summary: Recent work has shown that supervised learning alone, without temporal difference (TD) learning, can be remarkably effective for offline RL.
In every environment suite we consider simply maximizing likelihood with two-layer feedforward is competitive.
They also probe the limits of existing RvS methods, which are comparatively weak on random data.
- Score: 77.91045677562802
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Recent work has shown that supervised learning alone, without temporal
difference (TD) learning, can be remarkably effective for offline RL. When does
this hold true, and which algorithmic components are necessary? Through
extensive experiments, we boil supervised learning for offline RL down to its
essential elements. In every environment suite we consider, simply maximizing
likelihood with a two-layer feedforward MLP is competitive with
state-of-the-art results of substantially more complex methods based on TD
learning or sequence modeling with Transformers. Carefully choosing model
capacity (e.g., via regularization or architecture) and choosing which
information to condition on (e.g., goals or rewards) are critical for
performance. These insights serve as a field guide for practitioners doing
Reinforcement Learning via Supervised Learning (which we coin "RvS learning").
They also probe the limits of existing RvS methods, which are comparatively
weak on random data, and suggest a number of open problems.
Related papers
- Exploration is Harder than Prediction: Cryptographically Separating Reinforcement Learning from Supervised Learning [34.791182995710024]
We show the first cryptographic separation between reinforcement learning and supervised learning.
We also show that there is no computationally efficient algorithm for reward-directed RL in block MDPs.
arXiv Detail & Related papers (2024-04-04T19:35:41Z) - The Generalization Gap in Offline Reinforcement Learning [26.583205544712403]
offline learning algorithms perform worse on new environments than online learning ones.
Behavioral cloning is a strong baseline, outperforming state-of-the-art offline RL and sequence modeling approaches.
arXiv Detail & Related papers (2023-12-10T03:40:52Z) - Implicit Offline Reinforcement Learning via Supervised Learning [83.8241505499762]
Offline Reinforcement Learning (RL) via Supervised Learning is a simple and effective way to learn robotic skills from a dataset collected by policies of different expertise levels.
We show how implicit models can leverage return information and match or outperform explicit algorithms to acquire robotic skills from fixed datasets.
arXiv Detail & Related papers (2022-10-21T21:59:42Z) - A Workflow for Offline Model-Free Robotic Reinforcement Learning [117.07743713715291]
offline reinforcement learning (RL) enables learning control policies by utilizing only prior experience, without any online interaction.
We develop a practical workflow for using offline RL analogous to the relatively well-understood for supervised learning problems.
We demonstrate the efficacy of this workflow in producing effective policies without any online tuning.
arXiv Detail & Related papers (2021-09-22T16:03:29Z) - Learning Dexterous Manipulation from Suboptimal Experts [69.8017067648129]
Relative Entropy Q-Learning (REQ) is a simple policy algorithm that combines ideas from successful offline and conventional RL algorithms.
We show how REQ is also effective for general off-policy RL, offline RL, and RL from demonstrations.
arXiv Detail & Related papers (2020-10-16T18:48:49Z) - RL Unplugged: A Suite of Benchmarks for Offline Reinforcement Learning [108.9599280270704]
We propose a benchmark called RL Unplugged to evaluate and compare offline RL methods.
RL Unplugged includes data from a diverse range of domains including games and simulated motor control problems.
We will release data for all our tasks and open-source all algorithms presented in this paper.
arXiv Detail & Related papers (2020-06-24T17:14:51Z) - AWAC: Accelerating Online Reinforcement Learning with Offline Datasets [84.94748183816547]
We show that our method, advantage weighted actor critic (AWAC), enables rapid learning of skills with a combination of prior demonstration data and online experience.
Our results show that incorporating prior data can reduce the time required to learn a range of robotic skills to practical time-scales.
arXiv Detail & Related papers (2020-06-16T17:54:41Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.