Related papers: Language Models For Generalised PDDL Planning: Synthesising Sound and Programmatic Policies

Language Models For Generalised PDDL Planning: Synthesising Sound and Programmatic Policies

URL: http://arxiv.org/abs/2508.18507v1
Date: Mon, 25 Aug 2025 21:28:14 GMT
Title: Language Models For Generalised PDDL Planning: Synthesising Sound and Programmatic Policies
Authors: Dillon Z. Chen, Johannes Zenn, Tristan Cinquin, Sheila A. McIlraith,
Abstract summary: We study the usage of language models (LMs) for planning over world models specified in the Planning Domain Definition Language (PDDL)<n>We prompt LMs to generate Python programs that serve as generalised policies for solving PDDL problems from a given domain.<n>We conduct experiments on competition benchmarks which show that our policies can solve more PDDL problems than PDDL planners and recent LM approaches within a fixed time and memory constraint.
Score: 14.156642420488168
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We study the usage of language models (LMs) for planning over world models specified in the Planning Domain Definition Language (PDDL). We prompt LMs to generate Python programs that serve as generalised policies for solving PDDL problems from a given domain. Notably, our approach synthesises policies that are provably sound relative to the PDDL domain without reliance on external verifiers. We conduct experiments on competition benchmarks which show that our policies can solve more PDDL problems than PDDL planners and recent LM approaches within a fixed time and memory constraint. Our approach manifests in the LMPlan planner which can solve planning problems with several hundreds of relevant objects. Surprisingly, we observe that LMs used in our framework sometimes plan more effectively over PDDL problems written in meaningless symbols in place of natural language; e.g. rewriting (at dog kitchen) as (p2 o1 o3). This finding challenges hypotheses that LMs reason over word semantics and memorise solutions from its training corpus, and is worth further exploration.

Related papers

An End-to-end Planning Framework with Agentic LLMs and PDDL [9.718390674899771]
We present an end-to-end framework for planning supported by verifiers.<n>An orchestrator receives a human specification written in natural language and converts it into a PDDL model.<n>The validated domain and problem are then passed to an external planning engine to generate a plan.
arXiv Detail & Related papers (2025-12-10T13:17:08Z)
Improved Generalized Planning with LLMs through Strategy Refinement and Reflection [58.79806530685551]
We introduce an approach that generates the strategy in the form of pseudocode.<n>We extend the Python debug phase with a reflection step prompting the LLM to pinpoint the reason for the observed plan failure.<n>Running experiments on 17 benchmark domains, we show that these extensions substantially improve the quality of the generalized plans.
arXiv Detail & Related papers (2025-08-19T14:42:18Z)
Generating Symbolic World Models via Test-time Scaling of Large Language Models [28.258707611580643]
Planning Domain Definition Language (PDDL) is leveraged as a planning abstraction that enables precise and formal state descriptions.<n>We introduce a simple yet effective algorithm, which first employs a Best-of-N sampling approach to improve the quality of the initial solution.<n>Our method outperforms o1-mini by a considerable margin in the generation of PDDL domains.
arXiv Detail & Related papers (2025-02-07T07:52:25Z)
Planning with Vision-Language Models and a Use Case in Robot-Assisted Teaching [0.9217021281095907]
This paper introduces Image2PDDL, a novel framework that leverages Vision-Language Models (VLMs) to automatically convert images of initial states and descriptions of goal states into PDDL problems.<n>We evaluate the framework on various domains, including standard planning domains like blocksworld and sliding tile puzzles, using datasets with multiple difficulty levels.<n>We will discuss a potential use case in robot-assisted teaching of students with Autism Spectrum Disorder.
arXiv Detail & Related papers (2025-01-29T14:04:54Z)
Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models [7.3238629831871735]
Large Language Models (LLMs) have shown remarkable performance in various natural language tasks. Planning problems into the Planning Domain Definition Language (PDDL) has been proposed as a potential solution. We propose a novel approach that leverages LLMs and environment feedback to automatically generate PDDL domain and problem description files.
arXiv Detail & Related papers (2024-07-17T19:50:51Z)
PROC2PDDL: Open-Domain Planning Representations from Texts [56.627183903841164]
Proc2PDDL is the first dataset containing open-domain procedural texts paired with expert-annotated PDDL representations. We show that Proc2PDDL is highly challenging, with GPT-3.5's success rate close to 0% and GPT-4's around 35%.
arXiv Detail & Related papers (2024-02-29T19:40:25Z)
Real-World Planning with PDDL+ and Beyond [55.73913765642435]
We present Nyx, a novel PDDL+ planner built to emphasize lightness, simplicity, and, most importantly, adaptability. Nyx can be tailored to virtually any potential real-world application requiring some form of AI Planning, paving the way for wider adoption of planning methods for solving real-world problems.
arXiv Detail & Related papers (2024-02-19T07:35:49Z)
Automating the Generation of Prompts for LLM-based Action Choice in PDDL Planning [59.543858889996024]
Large language models (LLMs) have revolutionized a large variety of NLP tasks.<n>We show how to leverage an LLM to automatically generate NL prompts from PDDL input.<n>Our NL prompts yield better performance than PDDL prompts and simple template-based NL prompts.
arXiv Detail & Related papers (2023-11-16T11:55:27Z)
LM-Polygraph: Uncertainty Estimation for Language Models [71.21409522341482]
Uncertainty estimation (UE) methods are one path to safer, more responsible, and more effective use of large language models (LLMs) We introduce LM-Polygraph, a framework with implementations of a battery of state-of-the-art UE methods for LLMs in text generation tasks, with unified program interfaces in Python. It introduces an extendable benchmark for consistent evaluation of UE techniques by researchers, and a demo web application that enriches the standard chat dialog with confidence scores.
arXiv Detail & Related papers (2023-11-13T15:08:59Z)
HDDL 2.1: Towards Defining a Formalism and a Semantics for Temporal HTN Planning [64.07762708909846]
Real world applications need modelling rich and diverse automated planning problems. hierarchical task network (HTN) formalism does not allow to represent planning problems with numerical and temporal constraints. We propose to fill the gap between HDDL and these operational needs and to extend HDDL by taking inspiration from PDDL 2.1.
arXiv Detail & Related papers (2023-06-12T18:21:23Z)
Leveraging Pre-trained Large Language Models to Construct and Utilize World Models for Model-based Task Planning [39.29964085305846]
Methods that use pre-trained large language models directly as planners are currently impractical due to limited correctness of plans. In this work, we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners.
arXiv Detail & Related papers (2023-05-24T08:59:15Z)
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency [46.20085545432116]
Large language models (LLMs) have demonstrated remarkable zero-shot generalization abilities. classical planners, once a problem is given in a formatted way, can use efficient search algorithms to quickly identify correct, or even optimal, plans. This paper introduces LLM+P, the first framework that incorporates the strengths of classical planners into LLMs.
arXiv Detail & Related papers (2023-04-22T20:34:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.