Make Planning Research Rigorous Again!
- URL: http://arxiv.org/abs/2505.21674v1
- Date: Tue, 27 May 2025 18:51:06 GMT
- Title: Make Planning Research Rigorous Again!
- Authors: Michael Katz, Harsha Kokel, Christian Muise, Shirin Sohrabi, Sarath Sreedharan,
- Abstract summary: We argue that rigor should be applied to the current trend of work on planning with large language models.<n>The experience and expertise of the planning community are not just important from a historical perspective.<n>We believe that avoiding such known pitfalls will contribute greatly to the progress in building LLM-based planners.
- Score: 32.54078334699621
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: In over sixty years since its inception, the field of planning has made significant contributions to both the theory and practice of building planning software that can solve a never-before-seen planning problem. This was done through established practices of rigorous design and evaluation of planning systems. It is our position that this rigor should be applied to the current trend of work on planning with large language models. One way to do so is by correctly incorporating the insights, tools, and data from the automated planning community into the design and evaluation of LLM-based planners. The experience and expertise of the planning community are not just important from a historical perspective; the lessons learned could play a crucial role in accelerating the development of LLM-based planners. This position is particularly important in light of the abundance of recent works that replicate and propagate the same pitfalls that the planning community has encountered and learned from. We believe that avoiding such known pitfalls will contribute greatly to the progress in building LLM-based planners and to planning in general.
Related papers
- Plan Your Travel and Travel with Your Plan: Wide-Horizon Planning and Evaluation via LLM [58.50687282180444]
Travel planning is a complex task requiring the integration of diverse real-world information and user preferences.<n>We formulate this as an $L3$ planning problem, emphasizing long context, long instruction, and long output.<n>We introduce Multiple Aspects of Planning (MAoP), enabling LLMs to conduct wide-horizon thinking to solve complex planning problems.
arXiv Detail & Related papers (2025-06-14T09:37:59Z) - Plan-R1: Safe and Feasible Trajectory Planning as Language Modeling [75.83583076519311]
Plan-R1 is a novel two-stage trajectory planning framework that formulates trajectory planning as a sequential prediction task.<n>In the first stage, we train an autoregressive trajectory predictor via next motion token prediction on expert data.<n>In the second stage, we design rule-based rewards (e.g., collision avoidance, speed limits) and fine-tune the model using Group Relative Policy Optimization.
arXiv Detail & Related papers (2025-05-23T09:22:19Z) - Hindsight Planner: A Closed-Loop Few-Shot Planner for Embodied Instruction Following [62.10809033451526]
This work focuses on building a task planner for Embodied Instruction Following (EIF) using Large Language Models (LLMs)<n>We frame the task as a Partially Observable Markov Decision Process (POMDP) and aim to develop a robust planner under a few-shot assumption.<n>Our experiments on the ALFRED dataset indicate that our planner achieves competitive performance under a few-shot assumption.
arXiv Detail & Related papers (2024-12-27T10:05:45Z) - LHPF: Look back the History and Plan for the Future in Autonomous Driving [10.855426442780516]
This paper introduces LHPF, an imitation learning planner that integrates historical planning information.
Our approach employs a historical intention aggregation module that pools historical planning intentions.
Experiments using both real-world and synthetic data demonstrate that LHPF not only surpasses existing advanced learning-based planners in planning performance but also marks the first instance of a purely learning-based planner outperforming the expert.
arXiv Detail & Related papers (2024-11-26T09:30:26Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - Open Grounded Planning: Challenges and Benchmark Construction [44.86307213996181]
We propose a new planning task--open grounded planning.
The primary objective of open grounded planning is to ask the model to generate an executable plan based on a variable action set.
Then we test current state-of-the-art LLMs along with five planning approaches, revealing that existing LLMs and methods still struggle to address the challenges posed by grounded planning in open domains.
arXiv Detail & Related papers (2024-06-05T03:46:52Z) - Large Language Models are Learnable Planners for Long-Term Recommendation [59.167795967630305]
Planning for both immediate and long-term benefits becomes increasingly important in recommendation.
Existing methods apply Reinforcement Learning to learn planning capacity by maximizing cumulative reward for long-term recommendation.
We propose to leverage the remarkable planning capabilities over sparse data of Large Language Models for long-term recommendation.
arXiv Detail & Related papers (2024-02-29T13:49:56Z) - On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs [12.326862964753694]
We study the insight of the planning capability of large language models (LLMs) in off-the-shelf planning frameworks.
We propose a novel LLMs-based planning framework with LLMs embedded in two levels of planning graphs.
We empirically exhibit the effectiveness of our proposed framework in various planning domains.
arXiv Detail & Related papers (2024-02-18T15:53:32Z) - What's the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models [7.216683826556268]
Large language models (LLMs) are increasingly used for applications that require planning capabilities.
We introduce SimPlan, a novel hybrid-method, and evaluate its performance in a new challenging setup.
arXiv Detail & Related papers (2024-02-18T07:42:49Z) - LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner.
Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z) - PlanBench: An Extensible Benchmark for Evaluating Large Language Models
on Planning and Reasoning about Change [34.93870615625937]
PlanBench is a benchmark suite based on the kinds of domains used in the automated planning community.
PlanBench provides sufficient diversity in both the task domains and the specific planning capabilities.
arXiv Detail & Related papers (2022-06-21T16:15:27Z) - A Look at Value-Based Decision-Time vs. Background Planning Methods Across Different Settings [41.606112019744174]
We study how the value-based versions of decision-time and background planning methods will compare against each other across different settings.
Overall, our findings suggest that even though value-based versions of the two planning methods perform on par in their simplest instantiations, the modern instantiations of value-based decision-time planning methods can perform on par or better than the modern instantiations of value-based background planning methods.
arXiv Detail & Related papers (2022-06-16T20:48:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.