Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
- URL: http://arxiv.org/abs/2506.21039v1
- Date: Thu, 26 Jun 2025 06:35:42 GMT
- Title: Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning
- Authors: Jaebak Hwang, Sanghyeon Lee, Jeongmo Kim, Seungyul Han,
- Abstract summary: Strict Subgoal Execution (SSE) is a graph-based hierarchical RL framework that enforces single-step subgoal reachability.<n>We show that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.
- Score: 5.274804664403783
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Long-horizon goal-conditioned tasks pose fundamental challenges for reinforcement learning (RL), particularly when goals are distant and rewards are sparse. While hierarchical and graph-based methods offer partial solutions, they often suffer from subgoal infeasibility and inefficient planning. We introduce Strict Subgoal Execution (SSE), a graph-based hierarchical RL framework that enforces single-step subgoal reachability by structurally constraining high-level decision-making. To enhance exploration, SSE employs a decoupled exploration policy that systematically traverses underexplored regions of the goal space. Furthermore, a failure-aware path refinement, which refines graph-based planning by dynamically adjusting edge costs according to observed low-level success rates, thereby improving subgoal reliability. Experimental results across diverse long-horizon benchmarks demonstrate that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.
Related papers
- Hierarchical Reinforcement Learning with Targeted Causal Interventions [24.93050534953955]
Hierarchical reinforcement learning (HRL) improves the efficiency of long-horizon reinforcement-learning tasks with sparse rewards by decomposing the task into a hierarchy of subgoals.<n>We model the subgoal structure as a causal graph and propose a causal discovery algorithm to learn it.<n>We harness the discovered causal model to prioritize subgoal interventions based on their importance in attaining the final goal.
arXiv Detail & Related papers (2025-07-06T12:38:42Z) - Divide and Conquer: Grounding LLMs as Efficient Decision-Making Agents via Offline Hierarchical Reinforcement Learning [32.260964481673085]
Large language models (LLMs) struggle with long-horizon decision-making tasks due to deficient exploration and long-term credit assignment.<n>We propose an innovative framework that introduces a parameter-efficient and generally applicable hierarchy to LLM policies.<n>We develop a scheme where the low-level controller is supervised with abstract, step-by-step plans that are learned and instructed by the high-level policy.
arXiv Detail & Related papers (2025-05-26T09:43:40Z) - Flattening Hierarchies with Policy Bootstrapping [2.3940819037450987]
We introduce an algorithm to train a flat (non-hierarchical) goal-conditioned policy by bootstrapping on subgoal-conditioned policies with advantage-weighted importance sampling.<n>Our approach eliminates the need for a generative model over the (sub)goal space, which we find is key for scaling to high-dimensional control in large state spaces.
arXiv Detail & Related papers (2025-05-20T23:31:30Z) - PlanDQ: Hierarchical Plan Orchestration via D-Conductor and Q-Performer [47.924941959320996]
We propose a hierarchical planner designed for offline RL called PlanDQ.
PlanDQ incorporates a diffusion-based planner at the high level, named D-Conductor, which guides the low-level policy through sub-goals.
At the low level, we used a Q-learning based approach called the Q-Performer to accomplish these sub-goals.
arXiv Detail & Related papers (2024-06-10T20:59:53Z) - Stitching Sub-Trajectories with Conditional Diffusion Model for
Goal-Conditioned Offline RL [18.31263353823447]
We propose a model-based offline Goal-Conditioned Reinforcement Learning (Offline GCRL) method to acquire diverse goal-oriented skills.
In this paper, we use the diffusion model that generates future plans conditioned on the target goal and value, with the target value estimated from the goal-relabeled offline dataset.
We report state-of-the-art performance in the standard benchmark set of GCRL tasks, and demonstrate the capability to successfully stitch the segments of suboptimal trajectories in the offline data to generate high-quality plans.
arXiv Detail & Related papers (2024-02-11T15:23:13Z) - Semantically Aligned Task Decomposition in Multi-Agent Reinforcement
Learning [56.26889258704261]
We propose a novel "disentangled" decision-making method, Semantically Aligned task decomposition in MARL (SAMA)
SAMA prompts pretrained language models with chain-of-thought that can suggest potential goals, provide suitable goal decomposition and subgoal allocation as well as self-reflection-based replanning.
SAMA demonstrates considerable advantages in sample efficiency compared to state-of-the-art ASG methods.
arXiv Detail & Related papers (2023-05-18T10:37:54Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - Efficient Learning of High Level Plans from Play [57.29562823883257]
We present Efficient Learning of High-Level Plans from Play (ELF-P), a framework for robotic learning that bridges motion planning and deep RL.
We demonstrate that ELF-P has significantly better sample efficiency than relevant baselines over multiple realistic manipulation tasks.
arXiv Detail & Related papers (2023-03-16T20:09:47Z) - Swapped goal-conditioned offline reinforcement learning [8.284193221280216]
We present a general offline reinforcement learning method called deterministic Q-advantage policy gradient (DQAPG)
In the experiments, DQAPG outperforms state-of-the-art goal-conditioned offline RL methods in a wide range of benchmark tasks.
arXiv Detail & Related papers (2023-02-17T13:22:40Z) - Discrete Factorial Representations as an Abstraction for Goal
Conditioned Reinforcement Learning [99.38163119531745]
We show that applying a discretizing bottleneck can improve performance in goal-conditioned RL setups.
We experimentally prove the expected return on out-of-distribution goals, while still allowing for specifying goals with expressive structure.
arXiv Detail & Related papers (2022-11-01T03:31:43Z) - Successor Feature Landmarks for Long-Horizon Goal-Conditioned
Reinforcement Learning [54.378444600773875]
We introduce Successor Feature Landmarks (SFL), a framework for exploring large, high-dimensional environments.
SFL drives exploration by estimating state-novelty and enables high-level planning by abstracting the state-space as a non-parametric landmark-based graph.
We show in our experiments on MiniGrid and ViZDoom that SFL enables efficient exploration of large, high-dimensional state spaces.
arXiv Detail & Related papers (2021-11-18T18:36:05Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.