Related papers: Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning

Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning

URL: http://arxiv.org/abs/2006.11485v4
Date: Thu, 18 Mar 2021 09:48:05 GMT
Title: Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning
Authors: Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen
Abstract summary: Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning techniques. HRL often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large. We show that a constraint on the action space can be effectively alleviated by restricting it to a $k$-step adjacent region of the current state.
Score: 22.319208517053816
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Goal-conditioned hierarchical reinforcement learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. However, it often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is often large. Searching in a large goal space poses difficulties for both high-level subgoal generation and low-level policy learning. In this paper, we show that this problem can be effectively alleviated by restricting the high-level action space from the whole goal space to a $k$-step adjacent region of the current state using an adjacency constraint. We theoretically prove that the proposed adjacency constraint preserves the optimal hierarchical policy in deterministic MDPs, and show that this constraint can be practically implemented by training an adjacency network that can discriminate between adjacent and non-adjacent subgoals. Experimental results on discrete and continuous control tasks show that incorporating the adjacency constraint improves the performance of state-of-the-art HRL approaches in both deterministic and stochastic environments.

Related papers

Decoupled Hierarchical Reinforcement Learning with State Abstraction for Discrete Grids [3.772834044395258]
This paper presents a decoupled hierarchical RL framework integrating state abstraction (DcHRL-SA)<n> Experiments conducted in two customized grid environments demonstrate that the proposed approach consistently outperforms PPO in terms of exploration efficiency, convergence speed, cumulative reward, and policy stability.
arXiv Detail & Related papers (2025-06-01T06:36:19Z)
Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning [53.9544543607396]
We propose a novel framework that integrates reward rendering with Imitation from Observation (IfO)<n>By instantiating F-distance in different ways, we derive two theoretical analysis and develop a practical algorithm called Accessible State Oriented Policy Regularization (ASOR)<n>ASOR serves as a general add-on module that can be incorporated into various approaches RL, including offline RL and off-policy RL.
arXiv Detail & Related papers (2025-03-10T03:50:20Z)
COMBO-Grasp: Learning Constraint-Based Manipulation for Bimanual Occluded Grasping [56.907940167333656]
Occluded robot grasping is where the desired grasp poses are kinematically infeasible due to environmental constraints such as surface collisions. Traditional robot manipulation approaches struggle with the complexity of non-prehensile or bimanual strategies commonly used by humans. We introduce Constraint-based Manipulation for Bimanual Occluded Grasping (COMBO-Grasp), a learning-based approach which leverages two coordinated policies.
arXiv Detail & Related papers (2025-02-12T01:31:01Z)
Diffusion Predictive Control with Constraints [51.91057765703533]
Diffusion predictive control with constraints (DPCC) An algorithm for diffusion-based control with explicit state and action constraints that can deviate from those in the training data. We show through simulations of a robot manipulator that DPCC outperforms existing methods in satisfying novel test-time constraints while maintaining performance on the learned control task.
arXiv Detail & Related papers (2024-12-12T15:10:22Z)
Hierarchical Preference Optimization: Learning to achieve goals via feasible subgoals prediction [71.81851971324187]
This work introduces Hierarchical Preference Optimization (HPO), a novel approach to hierarchical reinforcement learning (HRL) HPO addresses non-stationarity and infeasible subgoal generation issues when solving complex robotic control tasks. Experiments on challenging robotic navigation and manipulation tasks demonstrate impressive performance of HPO, where it shows an improvement of up to 35% over the baselines.
arXiv Detail & Related papers (2024-11-01T04:58:40Z)
Bidirectional-Reachable Hierarchical Reinforcement Learning with Mutually Responsive Policies [26.915223518488016]
Hierarchical reinforcement learning (HRL) addresses complex long-horizon tasks by decomposing them into subgoals. We propose the Bidirectional-reachable Hierarchical Policy Optimization (BrHPO), a simple yet effective algorithm that also enjoys computation efficiency. Experiment results on a variety of long-horizon tasks showcase that BrHPO outperforms other state-of-the-art HRL baselines, coupled with a significantly higher exploration efficiency and robustness.
arXiv Detail & Related papers (2024-06-26T04:05:04Z)
A Provably Efficient Option-Based Algorithm for both High-Level and Low-Level Learning [54.20447310988282]
We present a meta-algorithm alternating between regret minimization algorithms instanced at different (high and low) temporal abstractions. At the higher level, we treat the problem as a Semi-Markov Decision Process (SMDP), with fixed low-level policies, while at a lower level, inner option policies are learned with a fixed high-level policy.
arXiv Detail & Related papers (2024-06-21T13:17:33Z)
MENTOR: Guiding Hierarchical Reinforcement Learning with Human Feedback and Dynamic Distance Constraint [40.3872201560003]
Hierarchical reinforcement learning (HRL) uses a hierarchical framework that divides tasks into subgoals and completes them sequentially. Current methods struggle to find suitable subgoals for ensuring a stable learning process. We propose a general hierarchical reinforcement learning framework incorporating human feedback and dynamic distance constraints.
arXiv Detail & Related papers (2024-02-22T03:11:09Z)
Semantically Aligned Task Decomposition in Multi-Agent Reinforcement Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA) SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning. SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z)
Reinforcement Learning with Stepwise Fairness Constraints [50.538878453547966]
We introduce the study of reinforcement learning with stepwise fairness constraints. We provide learning algorithms with strong theoretical guarantees in regard to policy optimality and fairness violation.
arXiv Detail & Related papers (2022-11-08T04:06:23Z)
Adversarially Guided Subgoal Generation for Hierarchical Reinforcement Learning [5.514236598436977]
We propose a novel HRL approach for mitigating the non-stationarity by adversarially enforcing the high-level policy to generate subgoals compatible with the current instantiation of the low-level policy. Experiments with state-of-the-art algorithms show that our approach significantly improves learning efficiency and overall performance of HRL in various challenging continuous control tasks.
arXiv Detail & Related papers (2022-01-24T12:30:38Z)
Hierarchical Reinforcement Learning with Timed Subgoals [11.758625350317274]
We introduce Hierarchical reinforcement learning with Timed Subgoals (HiTS) HiTS enables the agent to adapt its timing to a dynamic environment by specifying what goal state is to be reached and also when. Experiments show that our method is capable of sample-efficient learning where an existing state-of-the-art subgoal-based HRL method fails to learn stable solutions.
arXiv Detail & Related papers (2021-12-06T15:11:19Z)
Adjacency constraint for efficient hierarchical reinforcement learning [25.15808501708926]
Goal-conditioned Reinforcement Learning (HRL) is a promising approach for scaling up reinforcement learning (RL) techniques. HRL often suffers from training inefficiency as the action space of the high-level, i.e., the goal space, is large. We show that this problem can be effectively alleviated by restricting the high-level action space to a $k$-step adjacent region of the current state.
arXiv Detail & Related papers (2021-10-30T09:26:45Z)
Provable Hierarchy-Based Meta-Reinforcement Learning [50.17896588738377]
We analyze HRL in the meta-RL setting, where learner learns latent hierarchical structure during meta-training for use in a downstream task. We provide "diversity conditions" which, together with a tractable optimism-based algorithm, guarantee sample-efficient recovery of this natural hierarchy. Our bounds incorporate common notions in HRL literature such as temporal and state/action abstractions, suggesting that our setting and analysis capture important features of HRL in practice.
arXiv Detail & Related papers (2021-10-18T17:56:02Z)
Discrete Action On-Policy Learning with Action-Value Critic [72.20609919995086]
Reinforcement learning (RL) in discrete action space is ubiquitous in real-world applications, but its complexity grows exponentially with the action-space dimension. We construct a critic to estimate action-value functions, apply it on correlated actions, and combine these critic estimated action values to control the variance of gradient estimation. These efforts result in a new discrete action on-policy RL algorithm that empirically outperforms related on-policy algorithms relying on variance control techniques.
arXiv Detail & Related papers (2020-02-10T04:23:09Z)

This list is automatically generated from the titles and abstracts of the papers in this site.