Related papers: On the Planning Abilities of Large Language Models : A Critical Investigation

On the Planning Abilities of Large Language Models : A Critical Investigation

URL: http://arxiv.org/abs/2305.15771v2
Date: Mon, 6 Nov 2023 07:00:12 GMT
Title: On the Planning Abilities of Large Language Models : A Critical Investigation
Authors: Karthik Valmeekam, Matthew Marquez, Sarath Sreedharan, Subbarao Kambhampati
Abstract summary: We evaluate the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks. In the LLM-Modulo setting, we demonstrate that LLM-generated plans can improve the search process for underlying sound planners.
Score: 34.262740442260515
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Intrigued by the claims of emergent reasoning capabilities in LLMs trained on general web corpora, in this paper, we set out to investigate their planning capabilities. We aim to evaluate (1) the effectiveness of LLMs in generating plans autonomously in commonsense planning tasks and (2) the potential of LLMs in LLM-Modulo settings where they act as a source of heuristic guidance for external planners and verifiers. We conduct a systematic study by generating a suite of instances on domains similar to the ones employed in the International Planning Competition and evaluate LLMs in two distinct modes: autonomous and heuristic. Our findings reveal that LLMs' ability to generate executable plans autonomously is rather limited, with the best model (GPT-4) having an average success rate of ~12% across the domains. However, the results in the LLM-Modulo setting show more promise. In the LLM-Modulo setting, we demonstrate that LLM-generated plans can improve the search process for underlying sound planners and additionally show that external verifiers can help provide feedback on the generated plans and back-prompt the LLM for better plan generation.

Related papers

Query-Efficient Planning with Language Models [8.136901056728945]
Planning in complex environments requires an agent to efficiently query a world model to find a sequence of actions from start to goal. Recent work has shown that Large Language Models (LLMs) can potentially help with planning by searching over promising states and adapting to feedback from the world. We show that while both approaches improve upon comparable baselines, using an LLM as a generative planner results in significantly fewer interactions.
arXiv Detail & Related papers (2024-12-09T02:51:21Z)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
Efficient Prompting for LLM-based Generative Internet of Things [88.84327500311464]
Large language models (LLMs) have demonstrated remarkable capacities on various tasks, and integrating the capacities of LLMs into the Internet of Things (IoT) applications has drawn much research attention recently. Due to security concerns, many institutions avoid accessing state-of-the-art commercial LLM services, requiring the deployment and utilization of open-source LLMs in a local network setting. We propose a LLM-based Generative IoT (GIoT) system deployed in the local network setting in this study.
arXiv Detail & Related papers (2024-06-14T19:24:00Z)
From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world. Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting. We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z)
On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs [12.326862964753694]
We study the insight of the planning capability of large language models (LLMs) in off-the-shelf planning frameworks. We propose a novel LLMs-based planning framework with LLMs embedded in two levels of planning graphs. We empirically exhibit the effectiveness of our proposed framework in various planning domains.
arXiv Detail & Related papers (2024-02-18T15:53:32Z)
Understanding the planning of LLM agents: A survey [98.82513390811148]
This survey provides the first systematic view of LLM-based agents planning, covering recent works aiming to improve planning ability. Comprehensive analyses are conducted for each direction, and further challenges in the field of research are discussed.
arXiv Detail & Related papers (2024-02-05T04:25:24Z)
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity [61.54815512469125]
This survey addresses the crucial issue of factuality in Large Language Models (LLMs) As LLMs find applications across diverse domains, the reliability and accuracy of their outputs become vital.
arXiv Detail & Related papers (2023-10-11T14:18:03Z)
On the Planning Abilities of Large Language Models (A Critical Investigation with a Proposed Benchmark) [30.223130782579336]
We develop a benchmark suite based on the kinds of domains employed in the International Planning Competition. We evaluate LLMs in three modes: autonomous, human-in-the-loop and human-in-the-loop. Our results show that LLM's ability to autonomously generate executable plans is quite meager, averaging only about 3% success rate.
arXiv Detail & Related papers (2023-02-13T21:37:41Z)

This list is automatically generated from the titles and abstracts of the papers in this site.