Related papers: Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation

Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation

URL: http://arxiv.org/abs/2412.10675v1
Date: Sat, 14 Dec 2024 04:23:14 GMT
Title: Chasing Progress, Not Perfection: Revisiting Strategies for End-to-End LLM Plan Generation
Authors: Sukai Huang, Trevor Cohn, Nir Lipovetzky,
Abstract summary: This study reassesses recent strategies by developing an end-to-end LLM planner.<n>We find that fine-tuning LLMs on a corpus of planning instances does not lead to robust planning skills.<n>Various strategies, including Chain-of-Thought, do enhance the probability of a plan being executable.
Score: 34.636688162807836
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The capability of Large Language Models (LLMs) to plan remains a topic of debate. Some critics argue that strategies to boost LLMs' reasoning skills are ineffective in planning tasks, while others report strong outcomes merely from training models on a planning corpus. This study reassesses recent strategies by developing an end-to-end LLM planner and employing diverse metrics for a thorough evaluation. We find that merely fine-tuning LLMs on a corpus of planning instances does not lead to robust planning skills, as indicated by poor performance on out-of-distribution test sets. At the same time, we find that various strategies, including Chain-of-Thought, do enhance the probability of a plan being executable. This indicates progress towards better plan quality, despite not directly enhancing the final validity rate. Among the strategies we evaluated, reinforcement learning with our novel `Longest Contiguous Common Subsequence' reward emerged as the most effective, contributing to both plan validity and executability. Overall, our research addresses key misconceptions in the LLM-planning literature; we validate incremental progress in plan executability, although plan validity remains a challenge. Hence, future strategies should focus on both these aspects, drawing insights from our findings.

Related papers

How Far Are LLMs from Symbolic Planners? An NLP-Based Perspective [2.580765958706854]
We propose a recovery pipeline consisting of an NLP-based evaluation of the generated plans, along with three stages to recover the plans through NLP manipulation.<n>Our findings reveal no clear evidence of underlying reasoning during plan generation, and that a pipeline comprising an NLP-based analysis of the plans, followed by a recovery mechanism, still falls short of the quality and reliability of classical planners.
arXiv Detail & Related papers (2025-08-02T10:20:52Z)
PLAN-TUNING: Post-Training Language Models to Learn Step-by-Step Planning for Complex Problem Solving [66.42260489147617]
We introduce PLAN-TUNING, a framework that distills synthetic task decompositions from large-scale language models.<n>Plan-TUNING fine-tunes smaller models via supervised and reinforcement-learning objectives to improve complex reasoning.<n>Our analysis demonstrates how planning trajectories improves complex reasoning capabilities.
arXiv Detail & Related papers (2025-07-10T07:30:44Z)
PGPO: Enhancing Agent Reasoning via Pseudocode-style Planning Guided Preference Optimization [58.465778756331574]
We propose a pseudocode-style Planning Guided Preference Optimization method called PGPO for effective agent learning.<n>With two planning-oriented rewards, PGPO further enhances LLM agents' ability to generate high-quality P-code Plans.<n>Experiments show that PGPO achieves superior performance on representative agent benchmarks and outperforms the current leading baselines.
arXiv Detail & Related papers (2025-06-02T09:35:07Z)
LASP: Surveying the State-of-the-Art in Large Language Model-Assisted AI Planning [7.36760703426119]
This survey aims to highlight the existing challenges in planning with language models. It focuses on key areas such as embodied environments, optimal scheduling, competitive and cooperative games, task decomposition, reasoning, and planning.
arXiv Detail & Related papers (2024-09-03T11:39:52Z)
Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs [59.76268575344119]
We introduce a novel framework for enhancing large language models' (LLMs) planning capabilities by using planning data derived from knowledge graphs (KGs) LLMs fine-tuned with KG data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval.
arXiv Detail & Related papers (2024-06-20T13:07:38Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models [15.874604623294427]
Multi-Phases planning problem involves multiple interconnected stages, such as outlining, information gathering, and planning. Existing reasoning approaches have struggled to effectively address this complex task. Our research aims to address this challenge by developing a human-like planning framework for LLM agents.
arXiv Detail & Related papers (2024-05-28T14:13:32Z)
Large Language Models are Learnable Planners for Long-Term Recommendation [59.167795967630305]
Planning for both immediate and long-term benefits becomes increasingly important in recommendation. Existing methods apply Reinforcement Learning to learn planning capacity by maximizing cumulative reward for long-term recommendation. We propose to leverage the remarkable planning capabilities over sparse data of Large Language Models for long-term recommendation.
arXiv Detail & Related papers (2024-02-29T13:49:56Z)
What's the Plan? Evaluating and Developing Planning-Aware Techniques for Language Models [7.216683826556268]
Large language models (LLMs) are increasingly used for applications that require planning capabilities. We introduce SimPlan, a novel hybrid-method, and evaluate its performance in a new challenging setup.
arXiv Detail & Related papers (2024-02-18T07:42:49Z)
Understanding the planning of LLM agents: A survey [98.82513390811148]
This survey provides the first systematic view of LLM-based agents planning, covering recent works aiming to improve planning ability. Comprehensive analyses are conducted for each direction, and further challenges in the field of research are discussed.
arXiv Detail & Related papers (2024-02-05T04:25:24Z)
Learning Planning-based Reasoning by Trajectories Collection and Process Reward Synthesizing [61.98556945939045]
We propose a framework to learn planning-based reasoning through Direct Preference Optimization (DPO) on collected trajectories. Our results on challenging logical reasoning benchmarks demonstrate the effectiveness of our learning framework.
arXiv Detail & Related papers (2024-02-01T15:18:33Z)
On the Planning Abilities of Large Language Models : A Critical Investigation [34.262740442260515]
We evaluate the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks. In the LLM-Modulo setting, we demonstrate that LLM-generated plans can improve the search process for underlying sound planners.
arXiv Detail & Related papers (2023-05-25T06:32:23Z)
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change [34.93870615625937]
PlanBench is a benchmark suite based on the kinds of domains used in the automated planning community. PlanBench provides sufficient diversity in both the task domains and the specific planning capabilities.
arXiv Detail & Related papers (2022-06-21T16:15:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.