Scaling Goal-based Exploration via Pruning Proto-goals
- URL: http://arxiv.org/abs/2302.04693v1
- Date: Thu, 9 Feb 2023 15:22:09 GMT
- Title: Scaling Goal-based Exploration via Pruning Proto-goals
- Authors: Akhil Bagaria, Ray Jiang, Ramana Kumar, Tom Schaul
- Abstract summary: One of the gnarliest challenges in reinforcement learning is exploration that scales to vast domains.
Goal-directed, purposeful behaviours are able to overcome this, but rely on a good goal space.
Our approach explicitly seeks the middle ground, enabling the human designer to specify a vast but meaningful proto-goal space.
- Score: 10.976262029859424
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: One of the gnarliest challenges in reinforcement learning (RL) is exploration
that scales to vast domains, where novelty-, or coverage-seeking behaviour
falls short. Goal-directed, purposeful behaviours are able to overcome this,
but rely on a good goal space. The core challenge in goal discovery is finding
the right balance between generality (not hand-crafted) and tractability
(useful, not too many). Our approach explicitly seeks the middle ground,
enabling the human designer to specify a vast but meaningful proto-goal space,
and an autonomous discovery process to refine this to a narrower space of
controllable, reachable, novel, and relevant goals. The effectiveness of
goal-conditioned exploration with the latter is then demonstrated in three
challenging environments.
Related papers
- Exploring the Edges of Latent State Clusters for Goal-Conditioned Reinforcement Learning [6.266160051617362]
"Cluster Edge Exploration" ($CE2$) is a new goal-directed exploration algorithm that gives priority to goal states that remain accessible to the agent.
In challenging robotics environments, $CE2$ demonstrates superior efficiency in exploration compared to baseline methods and ablations.
arXiv Detail & Related papers (2024-11-03T01:21:43Z) - Towards Measuring Goal-Directedness in AI Systems [0.0]
A key prerequisite for AI systems pursuing unintended goals is whether they will behave in a coherent and goal-directed manner.
We propose a new family of definitions of the goal-directedness of a policy that analyze whether it is well-modeled as near-optimal for many reward functions.
Our contribution is a definition of goal-directedness that is simpler and more easily computable in order to approach the question of whether AI systems could pursue dangerous goals.
arXiv Detail & Related papers (2024-10-07T01:34:42Z) - Hierarchical reinforcement learning with natural language subgoals [26.725710518119044]
We use data from humans solving tasks to softly supervise the goal space for a set of long range tasks in a 3D embodied environment.
This has two advantages: first, it is easy to generate this data from naive human participants; second, it is flexible enough to represent a vast range of sub-goals in human-relevant tasks.
Our approach outperforms agents that clone expert behavior on these tasks, as well as HRL from scratch without this supervised sub-goal space.
arXiv Detail & Related papers (2023-09-20T18:03:04Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward
Long-Horizon Goal-Conditioned Reinforcement Learning [6.540225358657128]
Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment.
Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals.
In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal.
arXiv Detail & Related papers (2022-10-28T11:11:04Z) - Deep Hierarchical Planning from Pixels [86.14687388689204]
Director is a method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model.
Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.
Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
arXiv Detail & Related papers (2022-06-08T18:20:15Z) - Successor Feature Landmarks for Long-Horizon Goal-Conditioned
Reinforcement Learning [54.378444600773875]
We introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments.
SFL drives exploration by estimating state-novelty and enables high-level planning by abstracting the state-space as a non-parametric landmark-based graph.
We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces.
arXiv Detail & Related papers (2021-11-18T18:36:05Z) - Batch Exploration with Examples for Scalable Robotic Reinforcement
Learning [63.552788688544254]
Batch Exploration with Examples (BEE) explores relevant regions of the state-space guided by a modest number of human provided images of important states.
BEE is able to tackle challenging vision-based manipulation tasks both in simulation and on a real Franka robot.
arXiv Detail & Related papers (2020-10-22T17:49:25Z) - Maximum Entropy Gain Exploration for Long Horizon Multi-goal
Reinforcement Learning [35.44552072132894]
We argue that a learning agent should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution.
We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks.
arXiv Detail & Related papers (2020-07-06T15:36:05Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - LEAF: Latent Exploration Along the Frontier [47.304858727365094]
Self-supervised goal proposal and reaching is a key component for exploration and efficient policy learning algorithms.
We propose an exploration framework, which learns a dynamics-aware manifold of reachable states.
We demonstrate that the proposed self-supervised exploration algorithm, superior performance compared to existing baselines on a set of challenging robotic environments.
arXiv Detail & Related papers (2020-05-21T22:46:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.