Exploration in Approximate Hyper-State Space for Meta Reinforcement
Learning
- URL: http://arxiv.org/abs/2010.01062v3
- Date: Wed, 9 Jun 2021 21:43:46 GMT
- Title: Exploration in Approximate Hyper-State Space for Meta Reinforcement
Learning
- Authors: Luisa Zintgraf, Leo Feng, Cong Lu, Maximilian Igl, Kristian
Hartikainen, Katja Hofmann, Shimon Whiteson
- Abstract summary: We propose HyperX, which uses novel reward bonuses for meta-training to explore in approximate hyper-state space.
We show empirically that HyperX meta-learns better task-exploration and adapts more successfully to new tasks than existing methods.
- Score: 60.1292055717823
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: To rapidly learn a new task, it is often essential for agents to explore
efficiently -- especially when performance matters from the first timestep. One
way to learn such behaviour is via meta-learning. Many existing methods however
rely on dense rewards for meta-training, and can fail catastrophically if the
rewards are sparse. Without a suitable reward signal, the need for exploration
during meta-training is exacerbated. To address this, we propose HyperX, which
uses novel reward bonuses for meta-training to explore in approximate
hyper-state space (where hyper-states represent the environment state and the
agent's task belief). We show empirically that HyperX meta-learns better
task-exploration and adapts more successfully to new tasks than existing
methods.
Related papers
- First-Explore, then Exploit: Meta-Learning to Solve Hard Exploration-Exploitation Trade-Offs [2.0690113422225997]
First-Explore represents a significant step towards developing meta-RL algorithms capable of human-like exploration on a broader range of domains.
Our method, First-Explore, overcomes the limitation by learning two policies: one to solely explore, and one to solely exploit.
arXiv Detail & Related papers (2023-07-05T13:20:21Z) - On the Effectiveness of Fine-tuning Versus Meta-reinforcement Learning [71.55412580325743]
We show that multi-task pretraining with fine-tuning on new tasks performs equally as well, or better, than meta-pretraining with meta test-time adaptation.
This is encouraging for future research, as multi-task pretraining tends to be simpler and computationally cheaper than meta-RL.
arXiv Detail & Related papers (2022-06-07T13:24:00Z) - Follow your Nose: Using General Value Functions for Directed Exploration
in Reinforcement Learning [5.40729975786985]
This paper explores the idea of combining exploration with auxiliary task learning using General Value Functions (GVFs) and a directed exploration strategy.
We provide a simple way to learn options (sequences of actions) instead of having to handcraft them, and demonstrate the performance advantage in three navigation tasks.
arXiv Detail & Related papers (2022-03-02T05:14:11Z) - Hindsight Task Relabelling: Experience Replay for Sparse Reward Meta-RL [91.26538493552817]
We present a formulation of hindsight relabeling for meta-RL, which relabels experience during meta-training to enable learning to learn entirely using sparse reward.
We demonstrate the effectiveness of our approach on a suite of challenging sparse reward goal-reaching environments.
arXiv Detail & Related papers (2021-12-02T00:51:17Z) - Long-Term Exploration in Persistent MDPs [68.8204255655161]
We propose an exploration method called Rollback-Explore (RbExplore)
In this paper, we propose an exploration method called Rollback-Explore (RbExplore), which utilizes the concept of the persistent Markov decision process.
We test our algorithm in the hard-exploration Prince of Persia game, without rewards and domain knowledge.
arXiv Detail & Related papers (2021-09-21T13:47:04Z) - MetaCURE: Meta Reinforcement Learning with Empowerment-Driven
Exploration [52.48362697163477]
Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on sparse-reward tasks.
We model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning.
We develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies.
arXiv Detail & Related papers (2020-06-15T06:56:18Z) - Planning to Explore via Self-Supervised World Models [120.31359262226758]
Plan2Explore is a self-supervised reinforcement learning agent.
We present a new approach to self-supervised exploration and fast adaptation to new tasks.
Without any training supervision or task-specific interaction, Plan2Explore outperforms prior self-supervised exploration methods.
arXiv Detail & Related papers (2020-05-12T17:59:45Z) - Long-Term Visitation Value for Deep Exploration in Sparse Reward
Reinforcement Learning [34.38011902445557]
Reinforcement learning with sparse rewards is still an open challenge.
We present a novel approach that plans exploration actions far into the future by using a long-term visitation count.
Contrary to existing methods which use models of reward and dynamics, our approach is off-policy and model-free.
arXiv Detail & Related papers (2020-01-01T01:01:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.