Related papers: DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents

DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents

URL: http://arxiv.org/abs/2502.01956v1
Date: Tue, 04 Feb 2025 03:05:55 GMT
Title: DHP: Discrete Hierarchical Planning for Hierarchical Reinforcement Learning Agents
Authors: Shashank Sharma, Janina Hoffmann, Vinay Namboodiri,
Abstract summary: Our key contribution is a Discrete Hierarchical Planning (DHP) method, an alternative to traditional distance-based approaches.<n>We provide theoretical foundations for the method and demonstrate its effectiveness through extensive empirical evaluations.<n>We evaluate our method on long-horizon visual planning tasks in a 25-room environment, where it significantly outperforms previous benchmarks at success rate and average episode length.
Score: 2.1438108757511958
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In this paper, we address the challenge of long-horizon visual planning tasks using Hierarchical Reinforcement Learning (HRL). Our key contribution is a Discrete Hierarchical Planning (DHP) method, an alternative to traditional distance-based approaches. We provide theoretical foundations for the method and demonstrate its effectiveness through extensive empirical evaluations. Our agent recursively predicts subgoals in the context of a long-term goal and receives discrete rewards for constructing plans as compositions of abstract actions. The method introduces a novel advantage estimation strategy for tree trajectories, which inherently encourages shorter plans and enables generalization beyond the maximum tree depth. The learned policy function allows the agent to plan efficiently, requiring only $\log N$ computational steps, making re-planning highly efficient. The agent, based on a soft-actor critic (SAC) framework, is trained using on-policy imagination data. Additionally, we propose a novel exploration strategy that enables the agent to generate relevant training examples for the planning modules. We evaluate our method on long-horizon visual planning tasks in a 25-room environment, where it significantly outperforms previous benchmarks at success rate and average episode length. Furthermore, an ablation study highlights the individual contributions of key modules to the overall performance.

Related papers

Vote-Tree-Planner: Optimizing Execution Order in LLM-based Task Planning Pipeline via Voting [4.500734889060007]
This paper addresses the synergy between large language models (LLMs) and task planning systems. We propose Vote-Tree-Planner to minimize redundancy while enhancing planning effectiveness.
arXiv Detail & Related papers (2025-02-13T20:08:06Z)
Proposing Hierarchical Goal-Conditioned Policy Planning in Multi-Goal Reinforcement Learning [0.0]
We propose a method combining reinforcement learning and automated planning.<n>Our approach uses short goal-conditioned policies organized hierarchically, with Monte Carlo Tree Search (MCTS) planning using high-level actions (HLAs)<n>A single plan-tree, maintained during the agent's lifetime, holds knowledge about goal achievement.
arXiv Detail & Related papers (2025-01-03T09:37:54Z)
Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following [62.10809033451526]
This work focuses on building a task planner for Embodied Instruction Following (EIF) using Large Language Models (LLMs)<n>We frame the task as a Partially Observable Markov Decision Process (POMDP) and aim to develop a robust planner under a few-shot assumption.<n>Our experiments on the ALFRED dataset indicate that our planner achieves competitive performance under a few-shot assumption.
arXiv Detail & Related papers (2024-12-27T10:05:45Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
Simple Hierarchical Planning with Diffusion [54.48129192534653]
Diffusion-based generative methods have proven effective in modeling trajectories with offline datasets. We introduce the Hierarchical diffuser, a fast, yet surprisingly effective planning method combining the advantages of hierarchical and diffusion-based planning. Our model adopts a "jumpy" planning strategy at the higher level, which allows it to have a larger receptive field but at a lower computational cost.
arXiv Detail & Related papers (2024-01-05T05:28:40Z)
Imitating Graph-Based Planning with Goal-Conditioned Policies [72.61631088613048]
We present a self-imitation scheme which distills a subgoal-conditioned policy into the target-goal-conditioned policy. We empirically show that our method can significantly boost the sample-efficiency of the existing goal-conditioned RL methods.
arXiv Detail & Related papers (2023-03-20T14:51:10Z)
Hierarchical Imitation Learning with Vector Quantized Models [77.67190661002691]
We propose to use reinforcement learning to identify subgoals in expert trajectories. We build a vector-quantized generative model for the identified subgoals to perform subgoal-level planning. In experiments, the algorithm excels at solving complex, long-horizon decision-making problems outperforming state-of-the-art.
arXiv Detail & Related papers (2023-01-30T15:04:39Z)
Compositional Reinforcement Learning from Logical Specifications [21.193231846438895]
Recent approaches automatically generate a reward function from a given specification and use a suitable reinforcement learning algorithm to learn a policy. We develop a compositional learning approach, called DiRL, that interleaves high-level planning and reinforcement learning. Our approach then incorporates reinforcement learning to learn neural network policies for each edge (sub-task) within a Dijkstra-style planning algorithm to compute a high-level plan in the graph.
arXiv Detail & Related papers (2021-06-25T22:54:28Z)
Robust Hierarchical Planning with Policy Delegation [6.1678491628787455]
We propose a novel framework and algorithm for hierarchical planning based on the principle of delegation. We show this planning approach is experimentally very competitive to classic planning and reinforcement learning techniques on a variety of domains.
arXiv Detail & Related papers (2020-10-25T04:36:20Z)
PlanGAN: Model-based Planning With Sparse Rewards and Multiple Goals [14.315501760755609]
PlanGAN is a model-based algorithm for solving multi-goal tasks in environments with sparse rewards. Our studies indicate that PlanGAN can achieve comparable performance whilst being around 4-8 times more sample efficient.
arXiv Detail & Related papers (2020-06-01T12:53:09Z)
Divide-and-Conquer Monte Carlo Tree Search For Goal-Directed Planning [78.65083326918351]
We consider alternatives to an implicit sequential planning assumption. We propose Divide-and-Conquer Monte Carlo Tree Search (DC-MCTS) for approximating the optimal plan. We show that this algorithmic flexibility over planning order leads to improved results in navigation tasks in grid-worlds.
arXiv Detail & Related papers (2020-04-23T18:08:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.