Related papers: An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs

An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs

URL: http://arxiv.org/abs/2502.20175v1
Date: Thu, 27 Feb 2025 15:13:07 GMT
Title: An Extensive Evaluation of PDDL Capabilities in off-the-shelf LLMs
Authors: Kaustubh Vyas, Damien Graux, Sébastien Montella, Pavlos Vougiouklis, Ruofei Lai, Keshuang Li, Yang Ren, Jeff Z. Pan,
Abstract summary: Large language models (LLMs) have exhibited proficiency in code generation and chain-of-thought reasoning.<n>This study evaluates the potential of LLMs to understand and generate Planning Domain Definition Language (PDDL)
Score: 11.998185452551878
License: http://creativecommons.org/licenses/by/4.0/
Abstract: In recent advancements, large language models (LLMs) have exhibited proficiency in code generation and chain-of-thought reasoning, laying the groundwork for tackling automatic formal planning tasks. This study evaluates the potential of LLMs to understand and generate Planning Domain Definition Language (PDDL), an essential representation in artificial intelligence planning. We conduct an extensive analysis across 20 distinct models spanning 7 major LLM families, both commercial and open-source. Our comprehensive evaluation sheds light on the zero-shot LLM capabilities of parsing, generating, and reasoning with PDDL. Our findings indicate that while some models demonstrate notable effectiveness in handling PDDL, others pose limitations in more complex scenarios requiring nuanced planning knowledge. These results highlight the promise and current limitations of LLMs in formal planning tasks, offering insights into their application and guiding future efforts in AI-driven planning paradigms.

Related papers

Latent Diffusion Planning for Imitation Learning [78.56207566743154]
Latent Diffusion Planning (LDP) is a modular approach consisting of a planner and inverse dynamics model. By separating planning from action prediction, LDP can benefit from the denser supervision signals of suboptimal and action-free data. On simulated visual robotic manipulation tasks, LDP outperforms state-of-the-art imitation learning approaches.
arXiv Detail & Related papers (2025-04-23T17:53:34Z)
A Survey on Large Language Models for Automated Planning [15.767084100431115]
We critically investigate existing research on the use of Large Language Models in automated planning.<n>We illustrate that although LLMs are not well-suited to serve as standalone planners because of these limitations, they nonetheless present an enormous opportunity to enhance planning applications when combined with other approaches.
arXiv Detail & Related papers (2025-02-18T02:11:03Z)
Embodied CoT Distillation From LLM To Off-the-shelf Agents [6.318203525449058]
DeDer is a framework for decomposing and distilling the embodied reasoning capabilities from large language models (LLMs) Our experiments with the ALFRED benchmark demonstrate that DeDer surpasses leading language planning and distillation approaches.
arXiv Detail & Related papers (2024-12-16T07:18:02Z)
On the Limit of Language Models as Planning Formalizers [4.145422873316857]
Large Language Models fail to create verifiable plans in grounded environments.<n>An emerging line of work shows success in using LLM as a formalizer to generate a formal representation of the planning domain.<n>We observe that large enough models can effectively formalize descriptions as PDDL, outperforming those directly generating plans.
arXiv Detail & Related papers (2024-12-13T05:50:22Z)
Q*: Improving Multi-step Reasoning for LLMs with Deliberative Planning [53.6472920229013]
Large Language Models (LLMs) have demonstrated impressive capability in many natural language tasks. LLMs are prone to produce errors, hallucinations and inconsistent statements when performing multi-step reasoning. We introduce Q*, a framework for guiding LLMs decoding process with deliberative planning.
arXiv Detail & Related papers (2024-06-20T13:08:09Z)
Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs) We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios. We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z)
From Words to Actions: Unveiling the Theoretical Underpinnings of LLM-Driven Autonomous Systems [59.40480894948944]
Large language model (LLM) empowered agents are able to solve decision-making problems in the physical world. Under this model, the LLM Planner navigates a partially observable Markov decision process (POMDP) by iteratively generating language-based subgoals via prompting. We prove that the pretrained LLM Planner effectively performs Bayesian aggregated imitation learning (BAIL) through in-context learning.
arXiv Detail & Related papers (2024-05-30T09:42:54Z)
On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs [12.326862964753694]
We study the insight of the planning capability of large language models (LLMs) in off-the-shelf planning frameworks. We propose a novel LLMs-based planning framework with LLMs embedded in two levels of planning graphs. We empirically exhibit the effectiveness of our proposed framework in various planning domains.
arXiv Detail & Related papers (2024-02-18T15:53:32Z)
Understanding the planning of LLM agents: A survey [98.82513390811148]
This survey provides the first systematic view of LLM-based agents planning, covering recent works aiming to improve planning ability. Comprehensive analyses are conducted for each direction, and further challenges in the field of research are discussed.
arXiv Detail & Related papers (2024-02-05T04:25:24Z)
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning [84.6451394629312]
We introduce EgoPlan-Bench, a benchmark to evaluate the planning abilities of MLLMs in real-world scenarios. We show that EgoPlan-Bench poses significant challenges, highlighting a substantial scope for improvement in MLLMs to achieve human-level task planning. We also present EgoPlan-IT, a specialized instruction-tuning dataset that effectively enhances model performance on EgoPlan-Bench.
arXiv Detail & Related papers (2023-12-11T03:35:58Z)

This list is automatically generated from the titles and abstracts of the papers in this site.