DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents
- URL: http://arxiv.org/abs/2502.01956v2
- Date: Tue, 27 May 2025 15:30:20 GMT
- Title: DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents
- Authors: Shashank Sharma, Janina Hoffmann, Vinay Namboodiri,
- Abstract summary: We propose a method that replaces continuous distance estimates with discrete reachability checks to evaluate subgoal feasibility.<n>Experiments in 25-room navigation environments demonstrate $100%$ success rate.<n>The method also generalizes to momentum-based control tasks and requires only $log N$ steps for replanning.
- Score: 2.1438108757511958
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Hierarchical Reinforcement Learning (HRL) agents often struggle with long-horizon visual planning due to their reliance on error-prone distance metrics. We propose Discrete Hierarchical Planning (DHP), a method that replaces continuous distance estimates with discrete reachability checks to evaluate subgoal feasibility. DHP recursively constructs tree-structured plans by decomposing long-term goals into sequences of simpler subtasks, using a novel advantage estimation strategy that inherently rewards shorter plans and generalizes beyond training depths. In addition, to address the data efficiency challenge, we introduce an exploration strategy that generates targeted training examples for the planning modules without needing expert data. Experiments in 25-room navigation environments demonstrate $100\%$ success rate (vs $82\%$ baseline) and $73$-step average episode length (vs $158$-step baseline). The method also generalizes to momentum-based control tasks and requires only $\log N$ steps for replanning. Theoretical analysis and ablations validate our design choices.
Related papers
- PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving [66.42260489147617]
We introduce PLAN-TUNING, a framework that distills synthetic task decompositions from large-scale language models.<n>Plan-TUNING fine-tunes smaller models via supervised and reinforcement-learning objectives to improve complex reasoning.<n>Our analysis demonstrates how planning trajectories improves complex reasoning capabilities.
arXiv Detail & Related papers (2025-07-10T07:30:44Z) - Strict Subgoal Execution: Reliable Long-Horizon Planning in Hierarchical Reinforcement Learning [5.274804664403783]
Strict Subgoal Execution (SSE) is a graph-based hierarchical RL framework that enforces single-step subgoal reachability.<n>We show that SSE consistently outperforms existing goal-conditioned RL and hierarchical RL approaches in both efficiency and success rate.
arXiv Detail & Related papers (2025-06-26T06:35:42Z) - Efficient Robotic Policy Learning via Latent Space Backward Planning [17.770562202624962]
Current robotic planning methods often rely on predicting multi-frame images with full pixel details.<n>We propose a Latent Space Backward Planning scheme (LBP), which begins by grounding the task into final latent goals.<n>We show that LBP outperforms existing fine-grained and forward planning methods, achieving SOTA performance.
arXiv Detail & Related papers (2025-05-11T06:13:51Z) - HyperTree Planning: Enhancing LLM Reasoning via Hierarchical Thinking [109.09735490692202]
We propose HyperTree Planning (HTP), a novel reasoning paradigm that constructs hypertree-structured planning outlines for effective planning.<n> Experiments demonstrate the effectiveness of HTP, achieving state-of-the-art accuracy on the TravelPlanner benchmark with Gemini-1.5-Pro, resulting in a 3.6 times performance improvement over o1-preview.
arXiv Detail & Related papers (2025-05-05T02:38:58Z) - Extendable Long-Horizon Planning via Hierarchical Multiscale Diffusion [62.91968752955649]
This paper tackles a novel problem, extendable long-horizon planning-enabling agents to plan trajectories longer than those in training data without compounding errors.<n>We propose an augmentation method that iteratively generates longer trajectories by stitching shorter ones.<n>HM-Diffuser trains on these extended trajectories using a hierarchical structure, efficiently handling tasks across multiple temporal scales.
arXiv Detail & Related papers (2025-03-25T22:52:46Z) - Vote-Tree-Planner: Optimizing Execution Order in LLM-based Task Planning Pipeline via Voting [4.500734889060007]
This paper addresses the synergy between large language models (LLMs) and task planning systems.
We propose Vote-Tree-Planner to minimize redundancy while enhancing planning effectiveness.
arXiv Detail & Related papers (2025-02-13T20:08:06Z) - Proposing Hierarchical Goal-Conditioned Policy Planning in Multi-Goal Reinforcement Learning [0.0]
We propose a method combining reinforcement learning and automated planning.<n>Our approach uses short goal-conditioned policies organized hierarchically, with Monte Carlo Tree Search (MCTS) planning using high-level actions (HLAs)<n>A single plan-tree, maintained during the agent's lifetime, holds knowledge about goal achievement.
arXiv Detail & Related papers (2025-01-03T09:37:54Z) - Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following [62.10809033451526]
This work focuses on building a task planner for Embodied Instruction Following (EIF) using Large Language Models (LLMs)<n>We frame the task as a Partially Observable Markov Decision Process (POMDP) and aim to develop a robust planner under a few-shot assumption.<n>Our experiments on the ALFRED dataset indicate that our planner achieves competitive performance under a few-shot assumption.
arXiv Detail & Related papers (2024-12-27T10:05:45Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - Simple Hierarchical Planning with Diffusion [54.48129192534653]
Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets.
We introduce the Hierarchical diffuser, a fast, yet surprisingly effective planning method combining the advantages of hierarchical and diffusion-based planning.
Our model adopts a "jumpy" planning strategy at the higher level, which allows it to have a larger receptive field but at a lower computational cost.
arXiv Detail & Related papers (2024-01-05T05:28:40Z) - Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy.
We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z) - Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories.
We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning.
In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z) - C-Planning: An Automatic Curriculum for Learning Goal-Reaching Tasks [133.40619754674066]
Goal-conditioned reinforcement learning can solve tasks in a wide range of domains, including navigation and manipulation.
We propose the distant goal-reaching task by using search at training time to automatically generate intermediate states.
E-step corresponds to planning an optimal sequence of waypoints using graph search, while the M-step aims to learn a goal-conditioned policy to reach those waypoints.
arXiv Detail & Related papers (2021-10-22T22:05:31Z) - Compositional Reinforcement Learning from Logical Specifications [21.193231846438895]
Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy.
We develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning.
Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph.
arXiv Detail & Related papers (2021-06-25T22:54:28Z) - Robust Hierarchical Planning with Policy Delegation [6.1678491628787455]
We propose a novel framework and algorithm for hierarchical planning based on the principle of delegation.
We show this planning approach is experimentally very competitive to classic planning and reinforcement learning techniques on a variety of domains.
arXiv Detail & Related papers (2020-10-25T04:36:20Z) - PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals [14.315501760755609]
PlanGAN is a model-based algorithm for solving multi-goal tasks in environments with sparse rewards.
Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
arXiv Detail & Related papers (2020-06-01T12:53:09Z) - Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption.
We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan.
We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.