Maximum Entropy Gain Exploration for Long Horizon Multi-goal
Reinforcement Learning
- URL: http://arxiv.org/abs/2007.02832v1
- Date: Mon, 6 Jul 2020 15:36:05 GMT
- Title: Maximum Entropy Gain Exploration for Long Horizon Multi-goal
Reinforcement Learning
- Authors: Silviu Pitis, Harris Chan, Stephen Zhao, Bradly Stadie, Jimmy Ba
- Abstract summary: We argue that a learning agent should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution.
We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks.
- Score: 35.44552072132894
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: What goals should a multi-goal reinforcement learning agent pursue during
training in long-horizon tasks? When the desired (test time) goal distribution
is too distant to offer a useful learning signal, we argue that the agent
should not pursue unobtainable goals. Instead, it should set its own intrinsic
goals that maximize the entropy of the historical achieved goal distribution.
We propose to optimize this objective by having the agent pursue past achieved
goals in sparsely explored areas of the goal space, which focuses exploration
on the frontier of the achievable goal set. We show that our strategy achieves
an order of magnitude better sample efficiency than the prior state of the art
on long-horizon multi-goal tasks including maze navigation and block stacking.
Related papers
- Temporally Extended Goal Recognition in Fully Observable
Non-Deterministic Domain Models [43.460098744623416]
Existing approaches assume that goal hypotheses comprise a single conjunctive formula over a single final state.
We focus on temporally extended goals in Fully Observable Non-Deterministic (FOND) planning domain models.
Empirical results show that our approach is accurate in recognizing temporally extended goals in different recognition settings.
arXiv Detail & Related papers (2023-06-14T18:02:00Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward
Long-Horizon Goal-Conditioned Reinforcement Learning [6.540225358657128]
Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment.
Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals.
In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal.
arXiv Detail & Related papers (2022-10-28T11:11:04Z) - Successor Feature Landmarks for Long-Horizon Goal-Conditioned
Reinforcement Learning [54.378444600773875]
We introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments.
SFL drives exploration by estimating state-novelty and enables high-level planning by abstracting the state-space as a non-parametric landmark-based graph.
We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces.
arXiv Detail & Related papers (2021-11-18T18:36:05Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Learning to Plan Optimistically: Uncertainty-Guided Deep Exploration via
Latent Model Ensembles [73.15950858151594]
This paper presents Latent Optimistic Value Exploration (LOVE), a strategy that enables deep exploration through optimism in the face of uncertain long-term rewards.
We combine latent world models with value function estimation to predict infinite-horizon returns and recover associated uncertainty via ensembling.
We apply LOVE to visual robot control tasks in continuous action spaces and demonstrate on average more than 20% improved sample efficiency in comparison to state-of-the-art and other exploration objectives.
arXiv Detail & Related papers (2020-10-27T22:06:57Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.