Related papers: PlanGenLLMs: A Modern Survey of LLM Planning Capabilities

PlanGenLLMs: A Modern Survey of LLM Planning Capabilities

URL: http://arxiv.org/abs/2502.11221v1
Date: Sun, 16 Feb 2025 17:54:57 GMT
Title: PlanGenLLMs: A Modern Survey of LLM Planning Capabilities
Authors: Hui Wei, Zihao Zhang, Shenghua He, Tian Xia, Shijia Pan, Fei Liu,
Abstract summary: LLMs have immense potential for generating plans, transforming an initial world state into a desired goal state.<n>Many of these systems are tailored to specific problems, making it challenging to compare them or determine the best approach for new tasks.<n>Our survey aims to offer a comprehensive overview of current LLM planners to fill this gap.<n>It builds on foundational work by Kartam and Wilkins (1990) and examines six key performance criteria: completeness, executability, optimality, representation, generalization, and efficiency.
Score: 12.322175348741435
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: LLMs have immense potential for generating plans, transforming an initial world state into a desired goal state. A large body of research has explored the use of LLMs for various planning tasks, from web navigation to travel planning and database querying. However, many of these systems are tailored to specific problems, making it challenging to compare them or determine the best approach for new tasks. There is also a lack of clear and consistent evaluation criteria. Our survey aims to offer a comprehensive overview of current LLM planners to fill this gap. It builds on foundational work by Kartam and Wilkins (1990) and examines six key performance criteria: completeness, executability, optimality, representation, generalization, and efficiency. For each, we provide a thorough analysis of representative works and highlight their strengths and weaknesses. Our paper also identifies crucial future directions, making it a valuable resource for both practitioners and newcomers interested in leveraging LLM planning to support agentic workflows.

Related papers

WorkArena++: Towards Compositional Planning and Reasoning-based Common Knowledge Work Tasks [85.95607119635102]
Large language models (LLMs) can mimic human-like intelligence.<n>WorkArena++ is designed to evaluate the planning, problem-solving, logical/arithmetic reasoning, retrieval, and contextual understanding abilities of web agents.
arXiv Detail & Related papers (2024-07-07T07:15:49Z)
Learning to Plan for Retrieval-Augmented Large Language Models from Knowledge Graphs [59.76268575344119]
We introduce a novel framework for enhancing large language models' (LLMs) planning capabilities by using planning data derived from knowledge graphs (KGs) LLMs fine-tuned with KG data have improved planning capabilities, better equipping them to handle complex QA tasks that involve retrieval.
arXiv Detail & Related papers (2024-06-20T13:07:38Z)
Can Long-Context Language Models Subsume Retrieval, RAG, SQL, and More? [54.667202878390526]
Long-context language models (LCLMs) have the potential to revolutionize our approach to tasks traditionally reliant on external tools like retrieval systems or databases. We introduce LOFT, a benchmark of real-world tasks requiring context up to millions of tokens designed to evaluate LCLMs' performance on in-context retrieval and reasoning. Our findings reveal LCLMs' surprising ability to rival state-of-the-art retrieval and RAG systems, despite never having been explicitly trained for these tasks.
arXiv Detail & Related papers (2024-06-19T00:28:58Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
A Human-Like Reasoning Framework for Multi-Phases Planning Task with Large Language Models [15.874604623294427]
Multi-Phases planning problem involves multiple interconnected stages, such as outlining, information gathering, and planning. Existing reasoning approaches have struggled to effectively address this complex task. Our research aims to address this challenge by developing a human-like planning framework for LLM agents.
arXiv Detail & Related papers (2024-05-28T14:13:32Z)
Understanding the planning of LLM agents: A survey [98.82513390811148]
This survey provides the first systematic view of LLM-based agents planning, covering recent works aiming to improve planning ability. Comprehensive analyses are conducted for each direction, and further challenges in the field of research are discussed.
arXiv Detail & Related papers (2024-02-05T04:25:24Z)
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning [84.6451394629312]
We introduce EgoPlan-Bench, a benchmark to evaluate the planning abilities of MLLMs in real-world scenarios. We show that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning. We also present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench.
arXiv Detail & Related papers (2023-12-11T03:35:58Z)
Understanding the Capabilities of Large Language Models for Automated Planning [24.37599752610625]
The study seeks to shed light on the capabilities of LLMs in solving complex planning problems. It provides insights into the most effective approaches for using LLMs in this context.
arXiv Detail & Related papers (2023-05-25T15:21:09Z)
On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark) [30.223130782579336]
We develop a benchmark suite based on the kinds of domains employed in the International Planning Competition. We evaluate LLMs in three modes: autonomous, human-in-the-loop and human-in-the-loop. Our results show that LLM's ability to autonomously generate executable plans is quite meager, averaging only about 3% success rate.
arXiv Detail & Related papers (2023-02-13T21:37:41Z)
PlanBench: An Extensible Benchmark for Evaluating Large Language Models on Planning and Reasoning about Change [34.93870615625937]
PlanBench is a benchmark suite based on the kinds of domains used in the automated planning community. PlanBench provides sufficient diversity in both the task domains and the specific planning capabilities.
arXiv Detail & Related papers (2022-06-21T16:15:27Z)

This list is automatically generated from the titles and abstracts of the papers in this site.