How Far Are LLMs from Symbolic Planners? An NLP-Based Perspective
- URL: http://arxiv.org/abs/2508.01300v1
- Date: Sat, 02 Aug 2025 10:20:52 GMT
- Title: How Far Are LLMs from Symbolic Planners? An NLP-Based Perspective
- Authors: Ma'ayan Armony, Albert Meroño-Peñuela, Gerard Canal,
- Abstract summary: We propose a recovery pipeline consisting of an NLP-based evaluation of the generated plans, along with three stages to recover the plans through NLP manipulation.<n>Our findings reveal no clear evidence of underlying reasoning during plan generation, and that a pipeline comprising an NLP-based analysis of the plans, followed by a recovery mechanism, still falls short of the quality and reliability of classical planners.
- Score: 2.580765958706854
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The reasoning and planning abilities of Large Language Models (LLMs) have been a frequent topic of discussion in recent years. Their ability to take unstructured planning problems as input has made LLMs' integration into AI planning an area of interest. Nevertheless, LLMs are still not reliable as planners, with the generated plans often containing mistaken or hallucinated actions. Existing benchmarking and evaluation methods investigate planning with LLMs, focusing primarily on success rate as a quality indicator in various planning tasks, such as validating plans or planning in relaxed conditions. In this paper, we approach planning with LLMs as a natural language processing (NLP) task, given that LLMs are NLP models themselves. We propose a recovery pipeline consisting of an NLP-based evaluation of the generated plans, along with three stages to recover the plans through NLP manipulation of the LLM-generated plans, and eventually complete the plan using a symbolic planner. This pipeline provides a holistic analysis of LLM capabilities in the context of AI task planning, enabling a broader understanding of the quality of invalid plans. Our findings reveal no clear evidence of underlying reasoning during plan generation, and that a pipeline comprising an NLP-based analysis of the plans, followed by a recovery mechanism, still falls short of the quality and reliability of classical planners. On average, only the first 2.65 actions of the plan are executable, with the average length of symbolically generated plans being 8.4 actions. The pipeline still improves action quality and increases the overall success rate from 21.9% to 27.5%.
Related papers
- PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving [66.42260489147617]
We introduce PLAN-TUNING, a framework that distills synthetic task decompositions from large-scale language models.<n>Plan-TUNING fine-tunes smaller models via supervised and reinforcement-learning objectives to improve complex reasoning.<n>Our analysis demonstrates how planning trajectories improves complex reasoning capabilities.
arXiv Detail & Related papers (2025-07-10T07:30:44Z) - Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks [36.63527489464188]
Plan-and-Act is a framework that incorporates explicit planning into large language models (LLMs)<n>Plan-and-Act consists of a Planner model which generates structured, high-level plans to achieve user goals, and an Executor model that translates these plans into environment-specific actions.<n>We present a state-of-the-art 57.58% success rate on the WebArena-Lite benchmark as well as a text-only state-of-the-art 81.36% success rate on WebVoyager.
arXiv Detail & Related papers (2025-03-12T17:40:52Z) - Vote-Tree-Planner: Optimizing Execution Order in LLM-based Task Planning Pipeline via Voting [4.500734889060007]
This paper addresses the synergy between large language models (LLMs) and task planning systems.<n>We propose Vote-Tree-Planner to minimize redundancy while enhancing planning effectiveness.
arXiv Detail & Related papers (2025-02-13T20:08:06Z) - Zero-shot Robotic Manipulation with Language-guided Instruction and Formal Task Planning [16.89900521727246]
We propose an innovative language-guided symbolic task planning (LM-SymOpt) framework with optimization.<n>It is the first expert-free planning framework since we combine the world knowledge from Large Language Models with formal reasoning.<n>Our experimental results show that LM-SymOpt outperforms existing LLM-based planning approaches.
arXiv Detail & Related papers (2025-01-25T13:33:22Z) - Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation [34.636688162807836]
This study reassesses recent strategies by developing an end-to-end LLM planner.<n>We find that fine-tuning LLMs on a corpus of planning instances does not lead to robust planning skills.<n>Various strategies, including Chain-of-Thought, do enhance the probability of a plan being executable.
arXiv Detail & Related papers (2024-12-14T04:23:14Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - Open Grounded Planning: Challenges and Benchmark Construction [44.86307213996181]
We propose a new planning task--open grounded planning.
The primary objective of open grounded planning is to ask the model to generate an executable plan based on a variable action set.
Then we test current state-of-the-art LLMs along with five planning approaches, revealing that existing LLMs and methods still struggle to address the challenges posed by grounded planning in open domains.
arXiv Detail & Related papers (2024-06-05T03:46:52Z) - PDDLEGO: Iterative Planning in Textual Environments [56.12148805913657]
Planning in textual environments has been shown to be a long-standing challenge even for current models.
We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal.
We show that plans produced by few-shot PDDLEGO are 43% more efficient than generating plans end-to-end on the Coin Collector simulation.
arXiv Detail & Related papers (2024-05-30T08:01:20Z) - On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs [12.326862964753694]
We study the insight of the planning capability of large language models (LLMs) in off-the-shelf planning frameworks.
We propose a novel LLMs-based planning framework with LLMs embedded in two levels of planning graphs.
We empirically exhibit the effectiveness of our proposed framework in various planning domains.
arXiv Detail & Related papers (2024-02-18T15:53:32Z) - Automating the Generation of Prompts for LLM-based Action Choice in PDDL Planning [59.543858889996024]
Large language models (LLMs) have revolutionized a large variety of NLP tasks.<n>We show how to leverage an LLM to automatically generate NL prompts from PDDL input.<n>Our NL prompts yield better performance than PDDL prompts and simple template-based NL prompts.
arXiv Detail & Related papers (2023-11-16T11:55:27Z) - Tree-Planner: Efficient Close-loop Task Planning with Large Language Models [63.06270302774049]
Tree-Planner reframes task planning with Large Language Models into three distinct phases.
Tree-Planner achieves state-of-the-art performance while maintaining high efficiency.
arXiv Detail & Related papers (2023-10-12T17:59:50Z) - AdaPlanner: Adaptive Planning from Feedback with Language Models [56.367020818139665]
Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks.
We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback.
To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities.
arXiv Detail & Related papers (2023-05-26T05:52:27Z) - Learning to Plan with Natural Language [111.76828049344839]
Large Language Models (LLMs) have shown remarkable performance in various basic natural language tasks.
For completing the complex task, we still need a plan for the task to guide LLMs to generate the specific solutions step by step.
We propose the Learning to Plan method, which involves two phases: (1) In the first learning task plan phase, it iteratively updates the task plan with new step-by-step solutions and behavioral instructions, which are obtained by prompting LLMs to derive from training error feedback.
arXiv Detail & Related papers (2023-04-20T17:09:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.