Landmark-Guided Subgoal Generation in Hierarchical Reinforcement
Learning
- URL: http://arxiv.org/abs/2110.13625v2
- Date: Wed, 27 Oct 2021 13:52:38 GMT
- Title: Landmark-Guided Subgoal Generation in Hierarchical Reinforcement
Learning
- Authors: Junsu Kim, Younggyo Seo, Jinwoo Shin
- Abstract summary: We present HIerarchical reinforcement learning Guided by Landmarks (HIGL)
HIGL is a novel framework for training a high-level policy with a reduced action space guided by landmarks.
Our experiments demonstrate that our framework outperforms prior-arts across a variety of control tasks.
- Score: 64.97599673479678
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Goal-conditioned hierarchical reinforcement learning (HRL) has shown
promising results for solving complex and long-horizon RL tasks. However, the
action space of high-level policy in the goal-conditioned HRL is often large,
so it results in poor exploration, leading to inefficiency in training. In this
paper, we present HIerarchical reinforcement learning Guided by Landmarks
(HIGL), a novel framework for training a high-level policy with a reduced
action space guided by landmarks, i.e., promising states to explore. The key
component of HIGL is twofold: (a) sampling landmarks that are informative for
exploration and (b) encouraging the high-level policy to generate a subgoal
towards a selected landmark. For (a), we consider two criteria: coverage of the
entire visited state space (i.e., dispersion of states) and novelty of states
(i.e., prediction error of a state). For (b), we select a landmark as the very
first landmark in the shortest path in a graph whose nodes are landmarks. Our
experiments demonstrate that our framework outperforms prior-arts across a
variety of control tasks, thanks to efficient exploration guided by landmarks.
Related papers
- GOMAA-Geo: GOal Modality Agnostic Active Geo-localization [49.599465495973654]
We consider the task of active geo-localization (AGL) in which an agent uses a sequence of visual cues observed during aerial navigation to find a target specified through multiple possible modalities.
GOMAA-Geo is a goal modality active geo-localization agent for zero-shot generalization between different goal modalities.
arXiv Detail & Related papers (2024-06-04T02:59:36Z) - Balancing Exploration and Exploitation in Hierarchical Reinforcement
Learning via Latent Landmark Graphs [31.147969569517286]
Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising paradigm to address the exploration-exploitation dilemma in reinforcement learning.
The effectiveness of GCHRL heavily relies on subgoal representation functions and subgoal selection strategy.
This paper proposes HIerarchical reinforcement learning via dynamically building Latent Landmark graphs.
arXiv Detail & Related papers (2023-07-22T12:10:23Z) - HIQL: Offline Goal-Conditioned RL with Latent States as Actions [81.67963770528753]
We propose a hierarchical algorithm for goal-conditioned RL from offline data.
We show how this hierarchical decomposition makes our method robust to noise in the estimated value function.
Our method can solve long-horizon tasks that stymie prior methods, can scale to high-dimensional image observations, and can readily make use of action-free data.
arXiv Detail & Related papers (2023-07-22T00:17:36Z) - Curricular Subgoals for Inverse Reinforcement Learning [21.038691420095525]
Inverse Reinforcement Learning (IRL) aims to reconstruct the reward function from expert demonstrations to facilitate policy learning.
Existing IRL methods mainly focus on learning global reward functions to minimize the trajectory difference between the imitator and the expert.
We propose a novel Curricular Subgoal-based Inverse Reinforcement Learning framework, that explicitly disentangles one task with several local subgoals to guide agent imitation.
arXiv Detail & Related papers (2023-06-14T04:06:41Z) - On the Importance of Exploration for Generalization in Reinforcement
Learning [89.63074327328765]
We propose EDE: Exploration via Distributional Ensemble, a method that encourages exploration of states with high uncertainty.
Our algorithm is the first value-based approach to achieve state-of-the-art on both Procgen and Crafter.
arXiv Detail & Related papers (2023-06-08T18:07:02Z) - Goal Exploration Augmentation via Pre-trained Skills for Sparse-Reward
Long-Horizon Goal-Conditioned Reinforcement Learning [6.540225358657128]
Reinforcement learning (RL) often struggles to accomplish a sparse-reward long-horizon task in a complex environment.
Goal-conditioned reinforcement learning (GCRL) has been employed to tackle this difficult problem via a curriculum of easy-to-reach sub-goals.
In GCRL, exploring novel sub-goals is essential for the agent to ultimately find the pathway to the desired goal.
arXiv Detail & Related papers (2022-10-28T11:11:04Z) - Long-HOT: A Modular Hierarchical Approach for Long-Horizon Object
Transport [83.06265788137443]
We address key challenges in long-horizon embodied exploration and navigation by proposing a new object transport task and a novel modular framework for temporally extended navigation.
Our first contribution is the design of a novel Long-HOT environment focused on deep exploration and long-horizon planning.
We propose a modular hierarchical transport policy (HTP) that builds a topological graph of the scene to perform exploration with the help of weighted frontiers.
arXiv Detail & Related papers (2022-10-28T05:30:49Z) - Successor Feature Landmarks for Long-Horizon Goal-Conditioned
Reinforcement Learning [54.378444600773875]
We introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments.
SFL drives exploration by estimating state-novelty and enables high-level planning by abstracting the state-space as a non-parametric landmark-based graph.
We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces.
arXiv Detail & Related papers (2021-11-18T18:36:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.