Offline Reinforcement Learning as Anti-Exploration
- URL: http://arxiv.org/abs/2106.06431v1
- Date: Fri, 11 Jun 2021 14:41:30 GMT
- Title: Offline Reinforcement Learning as Anti-Exploration
- Authors: Shideh Rezaeifar, Robert Dadashi, Nino Vieillard, L\'eonard Hussenot,
Olivier Bachem, Olivier Pietquin, Matthieu Geist
- Abstract summary: We take inspiration from the literature on bonus-based exploration to design a new offline RL agent.
The core idea is to subtract a prediction-based exploration bonus from the reward, instead of adding it for exploration.
We show that our agent is competitive with the state of the art on a set of continuous control locomotion and manipulation tasks.
- Score: 49.72457136766916
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Offline Reinforcement Learning (RL) aims at learning an optimal control from
a fixed dataset, without interactions with the system. An agent in this setting
should avoid selecting actions whose consequences cannot be predicted from the
data. This is the converse of exploration in RL, which favors such actions. We
thus take inspiration from the literature on bonus-based exploration to design
a new offline RL agent. The core idea is to subtract a prediction-based
exploration bonus from the reward, instead of adding it for exploration. This
allows the policy to stay close to the support of the dataset. We connect this
approach to a more common regularization of the learned policy towards the
data. Instantiated with a bonus based on the prediction error of a variational
autoencoder, we show that our agent is competitive with the state of the art on
a set of continuous control locomotion and manipulation tasks.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Align Your Intents: Offline Imitation Learning via Optimal Transport [3.1728695158666396]
We show that an imitating agent can still learn the desired behavior merely from observing the expert.
In our method, AILOT, we involve special representation of states in a form of intents that incorporate pairwise spatial distances within the data.
We report that AILOT outperforms state-of-the art offline imitation learning algorithms on D4RL benchmarks.
arXiv Detail & Related papers (2024-02-20T14:24:00Z) - Survival Instinct in Offline Reinforcement Learning [28.319886852612672]
offline RL can produce well-optimal and safe policies even when trained with "wrong" reward labels.
We demonstrate that this surprising property is attributable to an interplay between the notion of pessimism in offline RL algorithms and certain implicit biases in common data collection practices.
Our empirical and theoretical results suggest a new paradigm for RL, whereby an agent is nudged to learn a desirable behavior with imperfect reward but purposely biased data coverage.
arXiv Detail & Related papers (2023-06-05T22:15:39Z) - Offline Reinforcement Learning for Human-Guided Human-Machine
Interaction with Private Information [110.42866062614912]
We study human-guided human-machine interaction involving private information.
We focus on offline reinforcement learning (RL) in this game.
We develop a novel identification result and use it to propose a new off-policy evaluation method.
arXiv Detail & Related papers (2022-12-23T06:26:44Z) - Let Offline RL Flow: Training Conservative Agents in the Latent Space of
Normalizing Flows [58.762959061522736]
offline reinforcement learning aims to train a policy on a pre-recorded and fixed dataset without any additional environment interactions.
We build upon recent works on learning policies in latent action spaces and use a special form of Normalizing Flows for constructing a generative model.
We evaluate our method on various locomotion and navigation tasks, demonstrating that our approach outperforms recently proposed algorithms.
arXiv Detail & Related papers (2022-11-20T21:57:10Z) - Don't Change the Algorithm, Change the Data: Exploratory Data for
Offline Reinforcement Learning [147.61075994259807]
We propose Exploratory data for Offline RL (ExORL), a data-centric approach to offline RL.
ExORL first generates data with unsupervised reward-free exploration, then relabels this data with a downstream reward before training a policy with offline RL.
We find that exploratory data allows vanilla off-policy RL algorithms, without any offline-specific modifications, to outperform or match state-of-the-art offline RL algorithms on downstream tasks.
arXiv Detail & Related papers (2022-01-31T18:39:27Z) - Offline Meta-Reinforcement Learning with Online Self-Supervision [66.42016534065276]
We propose a hybrid offline meta-RL algorithm, which uses offline data with rewards to meta-train an adaptive policy.
Our method uses the offline data to learn the distribution of reward functions, which is then sampled to self-supervise reward labels for the additional online data.
We find that using additional data and self-generated rewards significantly improves an agent's ability to generalize.
arXiv Detail & Related papers (2021-07-08T17:01:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.