Related papers: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning

URL: http://arxiv.org/abs/2007.02832v1
Date: Mon, 6 Jul 2020 15:36:05 GMT
Title: Maximum Entropy Gain Exploration for Long Horizon Multi-goal Reinforcement Learning
Authors: Silviu Pitis, Harris Chan, Stephen Zhao, Bradly Stadie, Jimmy Ba
Abstract summary: We argue that a learning agent should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution. We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks.
Score: 35.44552072132894
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: What goals should a multi-goal reinforcement learning agent pursue during training in long-horizon tasks? When the desired (test time) goal distribution is too distant to offer a useful learning signal, we argue that the agent should not pursue unobtainable goals. Instead, it should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution. We propose to optimize this objective by having the agent pursue past achieved goals in sparsely explored areas of the goal space, which focuses exploration on the frontier of the achievable goal set. We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks including maze navigation and block stacking.

Related papers

DISCOVER: Automated Curricula for Sparse-Reward Reinforcement Learning [33.66640909392995]
We argue that solving complex and high-dimensional tasks requires solving simpler tasks that are relevant to the target task.<n>We propose a method for directed sparse-reward goal-conditioned very long-horizon RL (DISCOVER), which selects exploratory goals in the direction of the target task.
arXiv Detail & Related papers (2025-05-26T11:35:07Z)
Temporally Extended Goal Recognition in Fully Observable Non-Deterministic Domain Models [43.460098744623416]
Existing approaches assume that goal hypotheses comprise a single conjunctive formula over a single final state. We focus on temporally extended goals in Fully Observable Non-Deterministic (FOND) planning domain models. Empirical results show that our approach is accurate in recognizing temporally extended goals in different recognition settings.
arXiv Detail & Related papers (2023-06-14T18:02:00Z)
Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups. We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z)
Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward Long-Horizon Goal-Conditioned Reinforcement Learning [6.540225358657128]
Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment. Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal.
arXiv Detail & Related papers (2022-10-28T11:11:04Z)
Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning [54.378444600773875]
We introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments. SFL drives exploration by estimating state-novelty and enables high-level planning by abstracting the state-space as a non-parametric landmark-based graph. We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces.
arXiv Detail & Related papers (2021-11-18T18:36:05Z)
C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation. We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states. E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z)
Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via Latent Model Ensembles [73.15950858151594]
This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards. We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling. We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives.
arXiv Detail & Related papers (2020-10-27T22:06:57Z)
Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors. In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency. We propose setting up an automatic curriculum for goals that the agent needs to solve. We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)

This list is automatically generated from the titles and abstracts of the papers in this site.