Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL
- URL: http://arxiv.org/abs/2112.00901v1
- Date: Thu, 2 Dec 2021 00:51:17 GMT
- Title: Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL
- Authors: Charles Packer, Pieter Abbeel, Joseph E. Gonzalez
- Abstract summary: We present a formulation of hindsight relabeling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward.
We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments.
- Score: 91.26538493552817
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Meta-reinforcement learning (meta-RL) has proven to be a successful framework
for leveraging experience from prior tasks to rapidly learn new related tasks,
however, current meta-RL approaches struggle to learn in sparse reward
environments. Although existing meta-RL algorithms can learn strategies for
adapting to new sparse reward tasks, the actual adaptation strategies are
learned using hand-shaped reward functions, or require simple environments
where random exploration is sufficient to encounter sparse reward. In this
paper, we present a formulation of hindsight relabeling for meta-RL, which
relabels experience during meta-training to enable learning to learn entirely
using sparse reward. We demonstrate the effectiveness of our approach on a
suite of challenging sparse reward goal-reaching environments that previously
required dense reward during meta-training to solve. Our approach solves these
environments using the true sparse reward function, with performance comparable
to training with a proxy dense reward function.
Related papers
- Black box meta-learning intrinsic rewards for sparse-reward environments [0.0]
This work investigates how meta-learning can improve the training signal received by RL agents.
We analyze and compare this approach to the use of extrinsic rewards and a meta-learned advantage function.
The developed algorithms are evaluated on distributions of continuous control tasks with both parametric and non-parametric variations.
arXiv Detail & Related papers (2024-07-31T12:09:33Z) - Meta Reinforcement Learning with Successor Feature Based Context [51.35452583759734]
We propose a novel meta-RL approach that achieves competitive performance comparing to existing meta-RL algorithms.
Our method does not only learn high-quality policies for multiple tasks simultaneously but also can quickly adapt to new tasks with a small amount of training.
arXiv Detail & Related papers (2022-07-29T14:52:47Z) - Learning Action Translator for Meta Reinforcement Learning on
Sparse-Reward Tasks [56.63855534940827]
This work introduces a novel objective function to learn an action translator among training tasks.
We theoretically verify that the value of the transferred policy with the action translator can be close to the value of the source policy.
We propose to combine the action translator with context-based meta-RL algorithms for better data collection and more efficient exploration during meta-training.
arXiv Detail & Related papers (2022-07-19T04:58:06Z) - On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z) - Hindsight Foresight Relabeling for Meta-Reinforcement Learning [20.755104281986757]
Meta-reinforcement learning (meta-RL) algorithms allow for agents to learn new behaviors from small amounts of experience.
While meta-RL agents can adapt quickly to new tasks at test time after experiencing only a few trajectories, the meta-training process is still sample-inefficient.
We devise a new relabeling method called Hindsight Foresight Relabeling (HFR)
HFR improves performance when compared to other relabeling methods on a variety of meta-RL tasks.
arXiv Detail & Related papers (2021-09-18T23:49:14Z) - MetaCURE: Meta Reinforcement Learning with Empowerment-Driven
Exploration [52.48362697163477]
Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on sparse-reward tasks.
We model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning.
We develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies.
arXiv Detail & Related papers (2020-06-15T06:56:18Z) - HMRL: Hyper-Meta Learning for Sparse Reward Reinforcement Learning
Problem [107.52043871875898]
We develop a novel meta reinforcement learning framework called Hyper-Meta RL(HMRL) for sparse reward RL problems.
It is consisted with three modules including the cross-environment meta state embedding module which constructs a common meta state space to adapt to different environments.
Experiments with sparse-reward environments show the superiority of HMRL on both transferability and policy learning efficiency.
arXiv Detail & Related papers (2020-02-11T07:31:11Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.