Rewriting History with Inverse RL: Hindsight Inference for Policy
Improvement
- URL: http://arxiv.org/abs/2002.11089v1
- Date: Tue, 25 Feb 2020 18:36:31 GMT
- Title: Rewriting History with Inverse RL: Hindsight Inference for Policy
Improvement
- Authors: Benjamin Eysenbach, Xinyang Geng, Sergey Levine, and Ruslan
Salakhutdinov
- Abstract summary: We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks.
Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
- Score: 137.29281352505245
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multi-task reinforcement learning (RL) aims to simultaneously learn policies
for solving many tasks. Several prior works have found that relabeling past
experience with different reward functions can improve sample efficiency.
Relabeling methods typically ask: if, in hindsight, we assume that our
experience was optimal for some task, for what task was it optimal? In this
paper, we show that hindsight relabeling is inverse RL, an observation that
suggests that we can use inverse RL in tandem for RL algorithms to efficiently
solve many tasks. We use this idea to generalize goal-relabeling techniques
from prior work to arbitrary classes of tasks. Our experiments confirm that
relabeling data using inverse RL accelerates learning in general multi-task
settings, including goal-reaching, domains with discrete sets of rewards, and
those with linear reward functions.
Related papers
- Leveraging Skills from Unlabeled Prior Data for Efficient Online Exploration [54.8229698058649]
We study how unlabeled prior trajectory data can be leveraged to learn efficient exploration strategies.
Our method SUPE (Skills from Unlabeled Prior data for Exploration) demonstrates that a careful combination of these ideas compounds their benefits.
We empirically show that SUPE reliably outperforms prior strategies, successfully solving a suite of long-horizon, sparse-reward tasks.
arXiv Detail & Related papers (2024-10-23T17:58:45Z) - Learning and reusing primitive behaviours to improve Hindsight
Experience Replay sample efficiency [7.806014635635933]
We propose a method that uses primitive behaviours that have been previously learned to solve simple tasks.
This guidance is not executed by a manually designed curriculum, but rather using a critic network to decide at each timestep whether or not to use the actions proposed.
We demonstrate the agents can learn a successful policy faster when using our proposed method, both in terms of sample efficiency and computation time.
arXiv Detail & Related papers (2023-10-03T06:49:57Z) - RL$^3$: Boosting Meta Reinforcement Learning via RL inside RL$^2$ [12.111848705677142]
We propose RL$3$, a hybrid approach that incorporates action-values, learned per task through traditional RL, in the inputs to meta-RL.
We show that RL$3$ earns greater cumulative reward in the long term, compared to RL$2$, while maintaining data-efficiency in the short term, and generalizes better to out-of-distribution tasks.
arXiv Detail & Related papers (2023-06-28T04:16:16Z) - A Survey of Meta-Reinforcement Learning [69.76165430793571]
We cast the development of better RL algorithms as a machine learning problem itself in a process called meta-RL.
We discuss how, at a high level, meta-RL research can be clustered based on the presence of a task distribution and the learning budget available for each individual task.
We conclude by presenting the open problems on the path to making meta-RL part of the standard toolbox for a deep RL practitioner.
arXiv Detail & Related papers (2023-01-19T12:01:41Z) - Contrastive Learning as Goal-Conditioned Reinforcement Learning [147.28638631734486]
In reinforcement learning (RL), it is easier to solve a task if given a good representation.
While deep RL should automatically acquire such good representations, prior work often finds that learning representations in an end-to-end fashion is unstable.
We show (contrastive) representation learning methods can be cast as RL algorithms in their own right.
arXiv Detail & Related papers (2022-06-15T14:34:15Z) - Beyond Tabula Rasa: Reincarnating Reinforcement Learning [37.201451908129386]
Learning tabula rasa, that is without any prior knowledge, is the prevalent workflow in reinforcement learning (RL) research.
We present reincarnating RL as an alternative workflow, where prior computational work is reused or transferred between design iterations of an RL agent.
We find that existing approaches fail in this setting and propose a simple algorithm to address their limitations.
arXiv Detail & Related papers (2022-06-03T15:11:10Z) - Reward Uncertainty for Exploration in Preference-based Reinforcement
Learning [88.34958680436552]
We present an exploration method specifically for preference-based reinforcement learning algorithms.
Our main idea is to design an intrinsic reward by measuring the novelty based on learned reward.
Our experiments show that exploration bonus from uncertainty in learned reward improves both feedback- and sample-efficiency of preference-based RL algorithms.
arXiv Detail & Related papers (2022-05-24T23:22:10Z) - Hindsight Foresight Relabeling for Meta-Reinforcement Learning [20.755104281986757]
Meta-reinforcement learning (meta-RL) algorithms allow for agents to learn new behaviors from small amounts of experience.
While meta-RL agents can adapt quickly to new tasks at test time after experiencing only a few trajectories, the meta-training process is still sample-inefficient.
We devise a new relabeling method called Hindsight Foresight Relabeling (HFR)
HFR improves performance when compared to other relabeling methods on a variety of meta-RL tasks.
arXiv Detail & Related papers (2021-09-18T23:49:14Z) - Generalized Hindsight for Reinforcement Learning [154.0545226284078]
We argue that low-reward data collected while trying to solve one task provides little to no signal for solving that particular task.
We present Generalized Hindsight: an approximate inverse reinforcement learning technique for relabeling behaviors with the right tasks.
arXiv Detail & Related papers (2020-02-26T18:57:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.