Hindsight Foresight Relabeling for Meta-Reinforcement Learning
- URL: http://arxiv.org/abs/2109.09031v1
- Date: Sat, 18 Sep 2021 23:49:14 GMT
- Title: Hindsight Foresight Relabeling for Meta-Reinforcement Learning
- Authors: Michael Wan, Jian Peng, Tanmay Gangwani
- Abstract summary: Meta-reinforcement learning (meta-RL) algorithms allow for agents to learn new behaviors from small amounts of experience.
While meta-RL agents can adapt quickly to new tasks at test time after experiencing only a few trajectories, the meta-training process is still sample-inefficient.
We devise a new relabeling method called Hindsight Foresight Relabeling (HFR)
HFR improves performance when compared to other relabeling methods on a variety of meta-RL tasks.
- Score: 20.755104281986757
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Meta-reinforcement learning (meta-RL) algorithms allow for agents to learn
new behaviors from small amounts of experience, mitigating the sample
inefficiency problem in RL. However, while meta-RL agents can adapt quickly to
new tasks at test time after experiencing only a few trajectories, the
meta-training process is still sample-inefficient. Prior works have found that
in the multi-task RL setting, relabeling past transitions and thus sharing
experience among tasks can improve sample efficiency and asymptotic
performance. We apply this idea to the meta-RL setting and devise a new
relabeling method called Hindsight Foresight Relabeling (HFR). We construct a
relabeling distribution using the combination of "hindsight", which is used to
relabel trajectories using reward functions from the training task
distribution, and "foresight", which takes the relabeled trajectories and
computes the utility of each trajectory for each task. HFR is easy to implement
and readily compatible with existing meta-RL algorithms. We find that HFR
improves performance when compared to other relabeling methods on a variety of
meta-RL tasks.
Related papers
- Train Hard, Fight Easy: Robust Meta Reinforcement Learning [78.16589993684698]
A major challenge of reinforcement learning (RL) in real-world applications is the variation between environments, tasks or clients.
Standard MRL methods optimize the average return over tasks, but often suffer from poor results in tasks of high risk or difficulty.
In this work, we define a robust MRL objective with a controlled level.
The data inefficiency is addressed via the novel Robust Meta RL algorithm (RoML)
arXiv Detail & Related papers (2023-01-26T14:54:39Z) - Efficient Meta Reinforcement Learning for Preference-based Fast
Adaptation [17.165083095799712]
We study the problem of few-shot adaptation in the context of human-in-the-loop reinforcement learning.
We develop a meta-RL algorithm that enables fast policy adaptation with preference-based feedback.
arXiv Detail & Related papers (2022-11-20T03:55:09Z) - Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms.
Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z) - Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL [91.26538493552817]
We present a formulation of hindsight relabeling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward.
We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments.
arXiv Detail & Related papers (2021-12-02T00:51:17Z) - Rewriting History with Inverse RL: Hindsight Inference for Policy
Improvement [137.29281352505245]
We show that hindsight relabeling is inverse RL, an observation that suggests that we can use inverse RL in tandem for RL algorithms to efficiently solve many tasks.
Our experiments confirm that relabeling data using inverse RL accelerates learning in general multi-task settings.
arXiv Detail & Related papers (2020-02-25T18:36:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.