Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward
Long-Horizon Goal-Conditioned Reinforcement Learning
- URL: http://arxiv.org/abs/2210.16058v2
- Date: Tue, 19 Dec 2023 11:00:18 GMT
- Title: Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward
Long-Horizon Goal-Conditioned Reinforcement Learning
- Authors: Lisheng Wu and Ke Chen
- Abstract summary: Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment.
Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals.
In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal.
- Score: 6.540225358657128
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Reinforcement learning (RL) often struggles to accomplish a sparse-reward
long-horizon task in a complex environment. Goal-conditioned reinforcement
learning (GCRL) has been employed to tackle this difficult problem via a
curriculum of easy-to-reach sub-goals. In GCRL, exploring novel sub-goals is
essential for the agent to ultimately find the pathway to the desired goal. How
to explore novel sub-goals efficiently is one of the most challenging issues in
GCRL. Several goal exploration methods have been proposed to address this issue
but still struggle to find the desired goals efficiently. In this paper, we
propose a novel learning objective by optimizing the entropy of both achieved
and new goals to be explored for more efficient goal exploration in sub-goal
selection based GCRL. To optimize this objective, we first explore and exploit
the frequently occurring goal-transition patterns mined in the environments
similar to the current task to compose skills via skill learning. Then, the
pretrained skills are applied in goal exploration. Evaluation on a variety of
spare-reward long-horizon benchmark tasks suggests that incorporating our
method into several state-of-the-art GCRL baselines significantly boosts their
exploration efficiency while improving or maintaining their performance. The
source code is available at: https://github.com/GEAPS/GEAPS.
Related papers
- Scaling Goal-based Exploration via Pruning Proto-goals [10.976262029859424]
One of the gnarliest challenges in reinforcement learning is exploration that scales to vast domains.
Goal-directed, purposeful behaviours are able to overcome this, but rely on a good goal space.
Our approach explicitly seeks the middle ground, enabling the human designer to specify a vast but meaningful proto-goal space.
arXiv Detail & Related papers (2023-02-09T15:22:09Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Successor Feature Landmarks for Long-Horizon Goal-Conditioned
Reinforcement Learning [54.378444600773875]
We introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments.
SFL drives exploration by estimating state-novelty and enables high-level planning by abstracting the state-space as a non-parametric landmark-based graph.
We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces.
arXiv Detail & Related papers (2021-11-18T18:36:05Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Decoupling Exploration and Exploitation for Meta-Reinforcement Learning
without Sacrifices [132.49849640628727]
meta-reinforcement learning (meta-RL) builds agents that can quickly learn new tasks by leveraging prior experience on related tasks.
In principle, optimal exploration and exploitation can be learned end-to-end by simply maximizing task performance.
We present DREAM, which avoids local optima in end-to-end training, without sacrificing optimal exploration.
arXiv Detail & Related papers (2020-08-06T17:57:36Z) - Maximum Entropy Gain Exploration for Long Horizon Multi-goal
Reinforcement Learning [35.44552072132894]
We argue that a learning agent should set its own intrinsic goals that maximize the entropy of the historical achieved goal distribution.
We show that our strategy achieves an order of magnitude better sample efficiency than the prior state of the art on long-horizon multi-goal tasks.
arXiv Detail & Related papers (2020-07-06T15:36:05Z) - Automatic Curriculum Learning through Value Disagreement [95.19299356298876]
Continually solving new, unsolved tasks is the key to learning diverse behaviors.
In the multi-task domain, where an agent needs to reach multiple goals, the choice of training goals can largely affect sample efficiency.
We propose setting up an automatic curriculum for goals that the agent needs to solve.
We evaluate our method across 13 multi-goal robotic tasks and 5 navigation tasks, and demonstrate performance gains over current state-of-the-art methods.
arXiv Detail & Related papers (2020-06-17T03:58:25Z) - MetaCURE: Meta Reinforcement Learning with Empowerment-Driven
Exploration [52.48362697163477]
Experimental evaluation shows that our meta-RL method significantly outperforms state-of-the-art baselines on sparse-reward tasks.
We model an exploration policy learning problem for meta-RL, which is separated from exploitation policy learning.
We develop a new off-policy meta-RL framework, which efficiently learns separate context-aware exploration and exploitation policies.
arXiv Detail & Related papers (2020-06-15T06:56:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.