PROC2PDDL: Open-Domain Planning Representations from Texts
- URL: http://arxiv.org/abs/2403.00092v2
- Date: Tue, 2 Jul 2024 04:50:36 GMT
- Title: PROC2PDDL: Open-Domain Planning Representations from Texts
- Authors: Tianyi Zhang, Li Zhang, Zhaoyi Hou, Ziyu Wang, Yuling Gu, Peter Clark, Chris Callison-Burch, Niket Tandon,
- Abstract summary: Proc2PDDL is the first dataset containing open-domain procedural texts paired with expert-annotated PDDL representations.
We show that Proc2PDDL is highly challenging, with GPT-3.5's success rate close to 0% and GPT-4's around 35%.
- Score: 56.627183903841164
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Planning in a text-based environment continues to be a major challenge for AI systems. Recent approaches have used language models to predict a planning domain definition (e.g., PDDL) but have only been evaluated in closed-domain simulated environments. To address this, we present Proc2PDDL , the first dataset containing open-domain procedural texts paired with expert-annotated PDDL representations. Using this dataset, we evaluate state-of-the-art models on defining the preconditions and effects of actions. We show that Proc2PDDL is highly challenging, with GPT-3.5's success rate close to 0% and GPT-4's around 35%. Our analysis shows both syntactic and semantic errors, indicating LMs' deficiency in both generating domain-specific prgorams and reasoning about events. We hope this analysis and dataset helps future progress towards integrating the best of LMs and formal planning.
Related papers
- Generating Symbolic World Models via Test-time Scaling of Large Language Models [28.258707611580643]
Planning Domain Definition Language (PDDL) is leveraged as a planning abstraction that enables precise and formal state descriptions.
We introduce a simple yet effective algorithm, which first employs a Best-of-N sampling approach to improve the quality of the initial solution and then refines the solution in a fine-grained manner with verbalized machine learning.
Our method outperforms o1-mini by a considerable margin in the generation of PDDL domain, achieving over 50% success rate on two tasks.
arXiv Detail & Related papers (2025-02-07T07:52:25Z) - On the Limit of Language Models as Planning Formalizers [4.145422873316857]
Large Language Models fail to create verifiable plans in grounded environments.
An emerging line of work shows success in using LLM as a formalizer to generate a formal representation of the planning domain.
We observe that large enough models can effectively formalize descriptions as PDDL, outperforming those directly generating plans.
arXiv Detail & Related papers (2024-12-13T05:50:22Z) - Leveraging Environment Interaction for Automated PDDL Translation and Planning with Large Language Models [7.3238629831871735]
Large Language Models (LLMs) have shown remarkable performance in various natural language tasks.
Planning problems into the Planning Domain Definition Language (PDDL) has been proposed as a potential solution.
We propose a novel approach that leverages LLMs and environment feedback to automatically generate PDDL domain and problem description files.
arXiv Detail & Related papers (2024-07-17T19:50:51Z) - Planetarium: A Rigorous Benchmark for Translating Text to Structured Planning Languages [20.62336315814875]
We introduce textitPlanetarium, a benchmark designed to evaluate language models' ability to generate PDDL code from natural language descriptions of planning tasks.
textitPlanetarium features a novel PDDL equivalence algorithm that flexibly evaluates the correctness of generated PDDL, along with a dataset of 145,918 text-to-PDDL pairs.
arXiv Detail & Related papers (2024-07-03T17:59:53Z) - PDDLEGO: Iterative Planning in Textual Environments [56.12148805913657]
Planning in textual environments has been shown to be a long-standing challenge even for current models.
We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal.
We show that plans produced by few-shot PDDLEGO are 43% more efficient than generating plans end-to-end on the Coin Collector simulation.
arXiv Detail & Related papers (2024-05-30T08:01:20Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - Leveraging Pre-trained Large Language Models to Construct and Utilize
World Models for Model-based Task Planning [39.29964085305846]
Methods that use pre-trained large language models directly as planners are currently impractical due to limited correctness of plans.
In this work, we introduce a novel alternative paradigm that constructs an explicit world (domain) model in planning domain definition language (PDDL) and then uses it to plan with sound domain-independent planners.
arXiv Detail & Related papers (2023-05-24T08:59:15Z) - Generalized Planning in PDDL Domains with Pretrained Large Language
Models [82.24479434984426]
We consider PDDL domains and use GPT-4 to synthesize Python programs.
We evaluate this approach in seven PDDL domains and compare it to four ablations and four baselines.
arXiv Detail & Related papers (2023-05-18T14:48:20Z) - A Picture is Worth a Thousand Words: Language Models Plan from Pixels [53.85753597586226]
Planning is an important capability of artificial agents that perform long-horizon tasks in real-world environments.
In this work, we explore the use of pre-trained language models (PLMs) to reason about plan sequences from text instructions in embodied visual environments.
arXiv Detail & Related papers (2023-03-16T02:02:18Z) - Tackling Long-Tailed Category Distribution Under Domain Shifts [50.21255304847395]
Existing approaches cannot handle the scenario where both issues exist.
We designed three novel core functional blocks including Distribution Calibrated Classification Loss, Visual-Semantic Mapping and Semantic-Similarity Guided Augmentation.
Two new datasets were proposed for this problem, named AWA2-LTS and ImageNet-LTS.
arXiv Detail & Related papers (2022-07-20T19:07:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.