Related papers: AutoGPT+P: Affordance-based Task Planning with Large Language Models

AutoGPT+P: Affordance-based Task Planning with Large Language Models

URL: http://arxiv.org/abs/2402.10778v2
Date: Tue, 23 Jul 2024 14:56:03 GMT
Title: AutoGPT+P: Affordance-based Task Planning with Large Language Models
Authors: Timo Birr, Christoph Pohl, Abdelrahman Younes, Tamim Asfour,
Abstract summary: AutoGPT+P is a system that combines an affordance-based scene representation with a planning system. Our approach achieves a success rate of 98%, surpassing the current 81% success rate of the current state-of-the-art LLM-based planning method SayCan.
Score: 6.848986296339031
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent advances in task planning leverage Large Language Models (LLMs) to improve generalizability by combining such models with classical planning algorithms to address their inherent limitations in reasoning capabilities. However, these approaches face the challenge of dynamically capturing the initial state of the task planning problem. To alleviate this issue, we propose AutoGPT+P, a system that combines an affordance-based scene representation with a planning system. Affordances encompass the action possibilities of an agent on the environment and objects present in it. Thus, deriving the planning domain from an affordance-based scene representation allows symbolic planning with arbitrary objects. AutoGPT+P leverages this representation to derive and execute a plan for a task specified by the user in natural language. In addition to solving planning tasks under a closed-world assumption, AutoGPT+P can also handle planning with incomplete information, e. g., tasks with missing objects by exploring the scene, suggesting alternatives, or providing a partial plan. The affordance-based scene representation combines object detection with an automatically generated object-affordance-mapping using ChatGPT. The core planning tool extends existing work by automatically correcting semantic and syntactic errors. Our approach achieves a success rate of 98%, surpassing the current 81% success rate of the current state-of-the-art LLM-based planning method SayCan on the SayCan instruction set. Furthermore, we evaluated our approach on our newly created dataset with 150 scenarios covering a wide range of complex tasks with missing objects, achieving a success rate of 79% on our dataset. The dataset and the code are publicly available at https://git.h2t.iar.kit.edu/birr/autogpt-p-standalone.

Related papers

Context Matters! Relaxing Goals with LLMs for Feasible 3D Scene Planning [2.111102681327218]
We present an approach integrating classical planning with Large Language Models.<n>We propose a hierarchical formulation that enables robots to make unfeasible tasks tractable.<n>Our method demonstrates its ability to adapt and execute tasks effectively within environments modeled using 3D Scene Graphs.
arXiv Detail & Related papers (2025-06-18T19:14:56Z)
Plan-and-Act: Improving Planning of Agents for Long-Horizon Tasks [36.63527489464188]
Plan-and-Act is a framework that incorporates explicit planning into large language models (LLMs) Plan-and-Act consists of a Planner model which generates structured, high-level plans to achieve user goals, and an Executor model that translates these plans into environment-specific actions. We present a state-of-the-art 57.58% success rate on the WebArena-Lite benchmark as well as a text-only state-of-the-art 81.36% success rate on WebVoyager.
arXiv Detail & Related papers (2025-03-12T17:40:52Z)
Dynamic Planning for LLM-based Graphical User Interface Automation [48.31532014795368]
We propose a novel approach called Dynamic Planning of Thoughts (D-PoT) for LLM-based GUI agents. D-PoT involves the dynamic adjustment of planning based on the environmental feedback and execution history. Experimental results reveal that the proposed D-PoT significantly surpassed the strong GPT-4V baseline by +12.7%.
arXiv Detail & Related papers (2024-10-01T07:49:24Z)
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos [48.15438373870542]
VidAssist is an integrated framework designed for zero/few-shot goal-oriented planning in instructional videos. It employs a breadth-first search algorithm for optimal plan generation. Experiments demonstrate that VidAssist offers a unified framework for different goal-oriented planning setups.
arXiv Detail & Related papers (2024-09-30T17:57:28Z)
PDDLEGO: Iterative Planning in Textual Environments [56.12148805913657]
Planning in textual environments has been shown to be a long-standing challenge even for current models. We propose PDDLEGO that iteratively construct a planning representation that can lead to a partial plan for a given sub-goal. We show that plans produced by few-shot PDDLEGO are 43% more efficient than generating plans end-to-end on the Coin Collector simulation.
arXiv Detail & Related papers (2024-05-30T08:01:20Z)
Consolidating Trees of Robotic Plans Generated Using Large Language Models to Improve Reliability [6.4111574364474215]
The inherent probabilistic nature of Large Language Models (LLMs) introduces an element of unpredictability. This paper introduces an innovative approach aims to generate correct and optimal robotic task plans for diverse real-world demands and scenarios.
arXiv Detail & Related papers (2024-01-15T18:01:59Z)
Learning adaptive planning representations with natural language guidance [90.24449752926866]
This paper describes Ada, a framework for automatically constructing task-specific planning representations. Ada interactively learns a library of planner-compatible high-level action abstractions and low-level controllers adapted to a particular domain of planning tasks.
arXiv Detail & Related papers (2023-12-13T23:35:31Z)
Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems. We propose a task-agnostic method named 'planning as in-painting' The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z)
Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint. During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations. Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z)
A Framework for Neurosymbolic Robot Action Planning using Large Language Models [3.0501524254444767]
We present a framework aimed at bridging the gap between symbolic task planning and machine learning approaches. The rationale is training Large Language Models (LLMs) into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL) Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%.
arXiv Detail & Related papers (2023-03-01T11:54:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.