LLMs Can Plan Only If We Tell Them
- URL: http://arxiv.org/abs/2501.13545v1
- Date: Thu, 23 Jan 2025 10:46:14 GMT
- Title: LLMs Can Plan Only If We Tell Them
- Authors: Bilgehan Sel, Ruoxi Jia, Ming Jin,
- Abstract summary: Large language models (LLMs) have demonstrated significant capabilities in natural language processing and reasoning.
This paper investigates whether LLMs can independently generate long-horizon plans that rival human baselines.
- Score: 16.593590353705697
- License:
- Abstract: Large language models (LLMs) have demonstrated significant capabilities in natural language processing and reasoning, yet their effectiveness in autonomous planning has been under debate. While existing studies have utilized LLMs with external feedback mechanisms or in controlled environments for planning, these approaches often involve substantial computational and development resources due to the requirement for careful design and iterative backprompting. Moreover, even the most advanced LLMs like GPT-4 struggle to match human performance on standard planning benchmarks, such as the Blocksworld, without additional support. This paper investigates whether LLMs can independently generate long-horizon plans that rival human baselines. Our novel enhancements to Algorithm-of-Thoughts (AoT), which we dub AoT+, help achieve state-of-the-art results in planning benchmarks out-competing prior methods and human baselines all autonomously.
Related papers
- A Survey on Large Language Models for Automated Planning [15.767084100431115]
We critically investigate existing research on the use of Large Language Models in automated planning.
We illustrate that although LLMs are not well-suited to serve as standalone planners because of these limitations, they nonetheless present an enormous opportunity to enhance planning applications when combined with other approaches.
arXiv Detail & Related papers (2025-02-18T02:11:03Z) - LLM-Generated Heuristics for AI Planning: Do We Even Need Domain-Independence Anymore? [87.71321254733384]
Large language models (LLMs) can generate planning approaches tailored to specific planning problems.
LLMs can achieve state-of-the-art performance on some standard IPC domains.
We discuss whether these results signify a paradigm shift and how they can complement existing planning approaches.
arXiv Detail & Related papers (2025-01-30T22:21:12Z) - Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world.
Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting.
We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z) - Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing [56.75702900542643]
We introduce AlphaLLM for the self-improvements of Large Language Models.
It integrates Monte Carlo Tree Search (MCTS) with LLMs to establish a self-improving loop.
Our experimental results show that AlphaLLM significantly enhances the performance of LLMs without additional annotations.
arXiv Detail & Related papers (2024-04-18T15:21:34Z) - LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner.
Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z) - EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning [84.6451394629312]
We introduce EgoPlan-Bench, a benchmark to evaluate the planning abilities of MLLMs in real-world scenarios.
We show that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning.
We also present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench.
arXiv Detail & Related papers (2023-12-11T03:35:58Z) - Understanding the Capabilities of Large Language Models for Automated
Planning [24.37599752610625]
The study seeks to shed light on the capabilities of LLMs in solving complex planning problems.
It provides insights into the most effective approaches for using LLMs in this context.
arXiv Detail & Related papers (2023-05-25T15:21:09Z) - On the Planning Abilities of Large Language Models (A Critical
Investigation with a Proposed Benchmark) [30.223130782579336]
We develop a benchmark suite based on the kinds of domains employed in the International Planning Competition.
We evaluate LLMs in three modes: autonomous, human-in-the-loop and human-in-the-loop.
Our results show that LLM's ability to autonomously generate executable plans is quite meager, averaging only about 3% success rate.
arXiv Detail & Related papers (2023-02-13T21:37:41Z) - Plansformer: Generating Symbolic Plans using Transformers [24.375997526106246]
Large Language Models (LLMs) have been the subject of active research, significantly advancing the field of Natural Language Processing (NLP)
We introduce Plansformer; an LLM fine-tuned on planning problems and capable of generating plans with favorable behavior in terms of correctness and length with reduced knowledge-engineering efforts.
For one configuration of Plansformer, we achieve 97% valid plans, out of which 95% are optimal for Towers of Hanoi - a puzzle-solving domain.
arXiv Detail & Related papers (2022-12-16T19:06:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.