Balancing Exploration and Exploitation in Hierarchical Reinforcement
Learning via Latent Landmark Graphs
- URL: http://arxiv.org/abs/2307.12063v1
- Date: Sat, 22 Jul 2023 12:10:23 GMT
- Title: Balancing Exploration and Exploitation in Hierarchical Reinforcement
Learning via Latent Landmark Graphs
- Authors: Qingyang Zhang, Yiming Yang, Jingqing Ruan, Xuantang Xiong, Dengpeng
Xing, Bo Xu
- Abstract summary: Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising paradigm to address the exploration-exploitation dilemma in reinforcement learning.
The effectiveness of GCHRL heavily relies on subgoal representation functions and subgoal selection strategy.
This paper proposes HIerarchical reinforcement learning via dynamically building Latent Landmark graphs.
- Score: 31.147969569517286
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising
paradigm to address the exploration-exploitation dilemma in reinforcement
learning. It decomposes the source task into subgoal conditional subtasks and
conducts exploration and exploitation in the subgoal space. The effectiveness
of GCHRL heavily relies on subgoal representation functions and subgoal
selection strategy. However, existing works often overlook the temporal
coherence in GCHRL when learning latent subgoal representations and lack an
efficient subgoal selection strategy that balances exploration and
exploitation. This paper proposes HIerarchical reinforcement learning via
dynamically building Latent Landmark graphs (HILL) to overcome these
limitations. HILL learns latent subgoal representations that satisfy temporal
coherence using a contrastive representation learning objective. Based on these
representations, HILL dynamically builds latent landmark graphs and employs a
novelty measure on nodes and a utility measure on edges. Finally, HILL develops
a subgoal selection strategy that balances exploration and exploitation by
jointly considering both measures. Experimental results demonstrate that HILL
outperforms state-of-the-art baselines on continuous control tasks with sparse
rewards in sample efficiency and asymptotic performance. Our code is available
at https://github.com/papercode2022/HILL.
Related papers
- Probabilistic Subgoal Representations for Hierarchical Reinforcement learning [16.756888009396462]
In goal-conditioned hierarchical reinforcement learning, a high-level policy specifies a subgoal for the low-level policy to reach.
Existing methods adopt a subgoal representation that provides a deterministic mapping from state space to latent subgoal space.
This paper employs a GP prior on the latent subgoal space to learn a posterior distribution over the subgoal representation functions.
arXiv Detail & Related papers (2024-06-24T15:09:22Z) - Learning Rational Subgoals from Demonstrations and Instructions [71.86713748450363]
We present a framework for learning useful subgoals that support efficient long-term planning to achieve novel goals.
At the core of our framework is a collection of rational subgoals (RSGs), which are essentially binary classifiers over the environmental states.
Given a goal description, the learned subgoals and the derived dependencies facilitate off-the-shelf planning algorithms, such as A* and RRT.
arXiv Detail & Related papers (2023-03-09T18:39:22Z) - Discovering Generalizable Spatial Goal Representations via Graph-based
Active Reward Learning [17.58129740811116]
We propose a reward learning approach, Graph-based Equivalence Mappings (GEM)
GEM represents a spatial goal specification by a reward function conditioned on i) a graph indicating important spatial relationships between objects and ii) state equivalence mappings for each edge in the graph.
We show that GEM can drastically improve the generalizability of the learned goal representations over strong baselines.
arXiv Detail & Related papers (2022-11-24T18:59:06Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Deep Hierarchical Planning from Pixels [86.14687388689204]
Director is a method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model.
Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization.
Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
arXiv Detail & Related papers (2022-06-08T18:20:15Z) - Successor Feature Landmarks for Long-Horizon Goal-Conditioned
Reinforcement Learning [54.378444600773875]
We introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments.
SFL drives exploration by estimating state-novelty and enables high-level planning by abstracting the state-space as a non-parametric landmark-based graph.
We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces.
arXiv Detail & Related papers (2021-11-18T18:36:05Z) - Landmark-Guided Subgoal Generation in Hierarchical Reinforcement
Learning [64.97599673479678]
We present HIerarchical reinforcement learning Guided by Landmarks (HIGL)
HIGL is a novel framework for training a high-level policy with a reduced action space guided by landmarks.
Our experiments demonstrate that our framework outperforms prior-arts across a variety of control tasks.
arXiv Detail & Related papers (2021-10-26T12:16:19Z) - Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions.
We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z) - Efficient Hierarchical Exploration with Stable Subgoal Representation
Learning [26.537055962523162]
We propose a state-specific regularization that stabilizes subgoal embeddings in well-explored areas.
We develop an efficient hierarchical exploration strategy that actively seeks out new promising subgoals and states.
arXiv Detail & Related papers (2021-05-31T07:28:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.