AutoPlanBench: Automatically generating benchmarks for LLM planners from
PDDL
- URL: http://arxiv.org/abs/2311.09830v2
- Date: Fri, 9 Feb 2024 09:48:41 GMT
- Title: AutoPlanBench: Automatically generating benchmarks for LLM planners from
PDDL
- Authors: Katharina Stein, Daniel Fi\v{s}er, J\"org Hoffmann and Alexander
Koller
- Abstract summary: We present AutoPlanBench, a novel method for automatically converting planning benchmarks written in PDDL into textual descriptions.
We show that while the best LLM planners do well on some planning tasks, others remain out of reach of current methods.
- Score: 52.005042190810116
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: LLMs are being increasingly used for planning-style tasks, but their
capabilities for planning and reasoning are poorly understood. We present
AutoPlanBench, a novel method for automatically converting planning benchmarks
written in PDDL into textual descriptions and offer a benchmark dataset created
with our method. We show that while the best LLM planners do well on some
planning tasks, others remain out of reach of current methods.
Related papers
- Exploring and Benchmarking the Planning Capabilities of Large Language Models [57.23454975238014]
This work lays the foundations for improving planning capabilities of large language models (LLMs)
We construct a comprehensive benchmark suite encompassing both classical planning benchmarks and natural language scenarios.
We investigate the use of many-shot in-context learning to enhance LLM planning, exploring the relationship between increased context length and improved planning performance.
arXiv Detail & Related papers (2024-06-18T22:57:06Z) - NATURAL PLAN: Benchmarking LLMs on Natural Language Planning [109.73382347588417]
We introduce NATURAL PLAN, a realistic planning benchmark in natural language containing 3 key tasks: Trip Planning, Meeting Planning, and Calendar Scheduling.
We focus our evaluation on the planning capabilities of LLMs with full information on the task, by providing outputs from tools such as Google Flights, Google Maps, and Google Calendar as contexts to the models.
arXiv Detail & Related papers (2024-06-06T21:27:35Z) - NL2Plan: Robust LLM-Driven Planning from Minimal Text Descriptions [8.004470925893957]
We present NL2Plan, the first domain-agnostic offline LLM-driven planning system.
We evaluate NL2Plan on four planning domains and find that it solves 10 out of 15 tasks.
In addition to using NL2Plan in end-to-end mode, users can inspect and correct all of its intermediate results.
arXiv Detail & Related papers (2024-05-07T11:27:13Z) - On the Roles of LLMs in Planning: Embedding LLMs into Planning Graphs [12.326862964753694]
We study the insight of the planning capability of large language models (LLMs) in off-the-shelf planning frameworks.
We propose a novel LLMs-based planning framework with LLMs embedded in two levels of planning graphs.
We empirically exhibit the effectiveness of our proposed framework in various planning domains.
arXiv Detail & Related papers (2024-02-18T15:53:32Z) - TIC: Translate-Infer-Compile for accurate "text to plan" using LLMs and Logical Representations [0.0]
We study the problem of generating plans for given natural language planning task requests.
Our approach comprises of (a) translate: using an LLM only for generating a interpretable intermediate representation of natural language task description.
We observe that using an LLM to only output the intermediate representation significantly reduces LLM errors.
arXiv Detail & Related papers (2024-02-09T18:39:13Z) - LLM-Assist: Enhancing Closed-Loop Planning with Language-Based Reasoning [65.86754998249224]
We develop a novel hybrid planner that leverages a conventional rule-based planner in conjunction with an LLM-based planner.
Our approach navigates complex scenarios which existing planners struggle with, produces well-reasoned outputs while also remaining grounded through working alongside the rule-based approach.
arXiv Detail & Related papers (2023-12-30T02:53:45Z) - AdaPlanner: Adaptive Planning from Feedback with Language Models [56.367020818139665]
Large language models (LLMs) have recently demonstrated the potential in acting as autonomous agents for sequential decision-making tasks.
We propose a closed-loop approach, AdaPlanner, which allows the LLM agent to refine its self-generated plan adaptively in response to environmental feedback.
To mitigate hallucination, we develop a code-style LLM prompt structure that facilitates plan generation across a variety of tasks, environments, and agent capabilities.
arXiv Detail & Related papers (2023-05-26T05:52:27Z) - AutoPlan: Automatic Planning of Interactive Decision-Making Tasks With
Large Language Models [11.895111124804503]
AutoPlan is an approach to guide LLM-based agents to accomplish interactive decision-making tasks.
Our experiments show that AutoPlan achieves success rates on par with the baselines.
arXiv Detail & Related papers (2023-05-24T11:52:23Z) - Learning to Plan with Natural Language [111.76828049344839]
Large Language Models (LLMs) have shown remarkable performance in various basic natural language tasks.
For completing the complex task, we still need a plan for the task to guide LLMs to generate the specific solutions step by step.
We propose the Learning to Plan method, which involves two phases: (1) In the first learning task plan phase, it iteratively updates the task plan with new step-by-step solutions and behavioral instructions, which are obtained by prompting LLMs to derive from training error feedback.
arXiv Detail & Related papers (2023-04-20T17:09:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.