Related papers: Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs

URL: http://arxiv.org/abs/2307.12063v1
Date: Sat, 22 Jul 2023 12:10:23 GMT
Title: Balancing Exploration and Exploitation in Hierarchical Reinforcement Learning via Latent Landmark Graphs
Authors: Qingyang Zhang, Yiming Yang, Jingqing Ruan, Xuantang Xiong, Dengpeng Xing, Bo Xu
Abstract summary: Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising paradigm to address the exploration-exploitation dilemma in reinforcement learning. The effectiveness of GCHRL heavily relies on subgoal representation functions and subgoal selection strategy. This paper proposes HIerarchical reinforcement learning via dynamically building Latent Landmark graphs.
Score: 31.147969569517286
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Goal-Conditioned Hierarchical Reinforcement Learning (GCHRL) is a promising paradigm to address the exploration-exploitation dilemma in reinforcement learning. It decomposes the source task into subgoal conditional subtasks and conducts exploration and exploitation in the subgoal space. The effectiveness of GCHRL heavily relies on subgoal representation functions and subgoal selection strategy. However, existing works often overlook the temporal coherence in GCHRL when learning latent subgoal representations and lack an efficient subgoal selection strategy that balances exploration and exploitation. This paper proposes HIerarchical reinforcement learning via dynamically building Latent Landmark graphs (HILL) to overcome these limitations. HILL learns latent subgoal representations that satisfy temporal coherence using a contrastive representation learning objective. Based on these representations, HILL dynamically builds latent landmark graphs and employs a novelty measure on nodes and a utility measure on edges. Finally, HILL develops a subgoal selection strategy that balances exploration and exploitation by jointly considering both measures. Experimental results demonstrate that HILL outperforms state-of-the-art baselines on continuous control tasks with sparse rewards in sample efficiency and asymptotic performance. Our code is available at https://github.com/papercode2022/HILL.

Related papers

Probabilistic Subgoal Representations for Hierarchical Reinforcement learning [16.756888009396462]
In goal-conditioned hierarchical reinforcement learning, a high-level policy specifies a subgoal for the low-level policy to reach. Existing methods adopt a subgoal representation that provides a deterministic mapping from state space to latent subgoal space. This paper employs a GP prior on the latent subgoal space to learn a posterior distribution over the subgoal representation functions.
arXiv Detail & Related papers (2024-06-24T15:09:22Z)
Learning Rational Subgoals from Demonstrations and Instructions [71.86713748450363]
We present a framework for learning useful subgoals that support efficient long-term planning to achieve novel goals. At the core of our framework is a collection of rational subgoals (RSGs), which are essentially binary classifiers over the environmental states. Given a goal description, the learned subgoals and the derived dependencies facilitate off-the-shelf planning algorithms, such as A* and RRT.
arXiv Detail & Related papers (2023-03-09T18:39:22Z)
Discovering Generalizable Spatial Goal Representations via Graph-based Active Reward Learning [17.58129740811116]
We propose a reward learning approach, Graph-based Equivalence Mappings (GEM) GEM represents a spatial goal specification by a reward function conditioned on i) a graph indicating important spatial relationships between objects and ii) state equivalence mappings for each edge in the graph. We show that GEM can drastically improve the generalizability of the learned goal representations over strong baselines.
arXiv Detail & Related papers (2022-11-24T18:59:06Z)
Discrete Factorial Representations as an Abstraction for Goal Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups. We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z)
Deep Hierarchical Planning from Pixels [86.14687388689204]
Director is a method for learning hierarchical behaviors directly from pixels by planning inside the latent space of a learned world model. Despite operating in latent space, the decisions are interpretable because the world model can decode goals into images for visualization. Director also learns successful behaviors across a wide range of environments, including visual control, Atari games, and DMLab levels.
arXiv Detail & Related papers (2022-06-08T18:20:15Z)
Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning [54.378444600773875]
We introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments. SFL drives exploration by estimating state-novelty and enables high-level planning by abstracting the state-space as a non-parametric landmark-based graph. We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces.
arXiv Detail & Related papers (2021-11-18T18:36:05Z)
Landmark-Guided Subgoal Generation in Hierarchical Reinforcement Learning [64.97599673479678]
We present HIerarchical reinforcement learning Guided by Landmarks (HIGL) HIGL is a novel framework for training a high-level policy with a reduced action space guided by landmarks. Our experiments demonstrate that our framework outperforms prior-arts across a variety of control tasks.
arXiv Detail & Related papers (2021-10-26T12:16:19Z)
Model-Based Reinforcement Learning via Latent-Space Collocation [110.04005442935828]
We argue that it is easier to solve long-horizon tasks by planning sequences of states rather than just actions. We adapt the idea of collocation, which has shown good results on long-horizon tasks in optimal control literature, to the image-based setting by utilizing learned latent state space models.
arXiv Detail & Related papers (2021-06-24T17:59:18Z)
Efficient Hierarchical Exploration with Stable Subgoal Representation Learning [26.537055962523162]
We propose a state-specific regularization that stabilizes subgoal embeddings in well-explored areas. We develop an efficient hierarchical exploration strategy that actively seeks out new promising subgoals and states.
arXiv Detail & Related papers (2021-05-31T07:28:59Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.