SayPlan: Grounding Large Language Models using 3D Scene Graphs for
Scalable Robot Task Planning
- URL: http://arxiv.org/abs/2307.06135v2
- Date: Wed, 27 Sep 2023 23:17:28 GMT
- Title: SayPlan: Grounding Large Language Models using 3D Scene Graphs for
Scalable Robot Task Planning
- Authors: Krishan Rana, Jesse Haviland, Sourav Garg, Jad Abou-Chakra, Ian Reid
and Niko Suenderhauf
- Abstract summary: We introduce SayPlan, a scalable approach to large-scale task planning for robotics using 3D scene graph (3DSG) representations.
We evaluate our approach on two large-scale environments spanning up to 3 floors and 36 rooms with 140 assets and objects.
- Score: 15.346150968195015
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Large language models (LLMs) have demonstrated impressive results in
developing generalist planning agents for diverse tasks. However, grounding
these plans in expansive, multi-floor, and multi-room environments presents a
significant challenge for robotics. We introduce SayPlan, a scalable approach
to LLM-based, large-scale task planning for robotics using 3D scene graph
(3DSG) representations. To ensure the scalability of our approach, we: (1)
exploit the hierarchical nature of 3DSGs to allow LLMs to conduct a 'semantic
search' for task-relevant subgraphs from a smaller, collapsed representation of
the full graph; (2) reduce the planning horizon for the LLM by integrating a
classical path planner and (3) introduce an 'iterative replanning' pipeline
that refines the initial plan using feedback from a scene graph simulator,
correcting infeasible actions and avoiding planning failures. We evaluate our
approach on two large-scale environments spanning up to 3 floors and 36 rooms
with 140 assets and objects and show that our approach is capable of grounding
large-scale, long-horizon task plans from abstract, and natural language
instruction for a mobile manipulator robot to execute. We provide real robot
video demonstrations on our project page https://sayplan.github.io.
Related papers
- VeriGraph: Scene Graphs for Execution Verifiable Robot Planning [33.8868315479384]
We propose VeriGraph, a framework that integrates vision-language models (VLMs) for robotic planning while verifying action feasibility.
VeriGraph employs scene graphs as an intermediate representation, capturing key objects and spatial relationships to improve plan verification and refinement.
Our approach significantly enhances task completion rates across diverse manipulation scenarios, outperforming baseline methods by 58% for language-based tasks and 30% for image-based tasks.
arXiv Detail & Related papers (2024-11-15T18:59:51Z) - Plan-Seq-Learn: Language Model Guided RL for Solving Long Horizon Robotics Tasks [50.27313829438866]
Plan-Seq-Learn (PSL) is a modular approach that uses motion planning to bridge the gap between abstract language and learned low-level control.
PSL achieves success rates of over 85%, out-performing language-based, classical, and end-to-end approaches.
arXiv Detail & Related papers (2024-05-02T17:59:31Z) - Interactive Planning Using Large Language Models for Partially
Observable Robotics Tasks [54.60571399091711]
Large Language Models (LLMs) have achieved impressive results in creating robotic agents for performing open vocabulary tasks.
We present an interactive planning technique for partially observable tasks using LLMs.
arXiv Detail & Related papers (2023-12-11T22:54:44Z) - Look Before You Leap: Unveiling the Power of GPT-4V in Robotic
Vision-Language Planning [32.045840007623276]
We introduce Robotic Vision-Language Planning (ViLa), a novel approach for long-horizon robotic planning.
ViLa directly integrates perceptual data into its reasoning and planning process.
Our evaluation, conducted in both real-robot and simulated environments, demonstrates ViLa's superiority over existing LLM-based planners.
arXiv Detail & Related papers (2023-11-29T17:46:25Z) - ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
Planning [125.90002884194838]
ConceptGraphs is an open-vocabulary graph-structured representation for 3D scenes.
It is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association.
We demonstrate the utility of this representation through a number of downstream planning tasks.
arXiv Detail & Related papers (2023-09-28T17:53:38Z) - Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint.
During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations.
Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z) - Learning to Reason over Scene Graphs: A Case Study of Finetuning GPT-2
into a Robot Language Model for Grounded Task Planning [45.51792981370957]
We investigate the applicability of a smaller class of large language models (LLMs) in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially.
Our method grounds the input of the LLM on the domain that is represented as a scene graph, enabling it to translate human requests into executable robot plans.
Our findings suggest that the knowledge stored in an LLM can be effectively grounded to perform long-horizon task planning, demonstrating the promising potential for the future application of neuro-symbolic planning methods in robotics.
arXiv Detail & Related papers (2023-05-12T18:14:32Z) - ProgPrompt: Generating Situated Robot Task Plans using Large Language
Models [68.57918965060787]
Large language models (LLMs) can be used to score potential next actions during task planning.
We present a programmatic LLM prompt structure that enables plan generation functional across situated environments.
arXiv Detail & Related papers (2022-09-22T20:29:49Z) - TASKOGRAPHY: Evaluating robot task planning over large 3D scene graphs [33.25317860393983]
TASKOGRAPHY is the first large-scale robotic task planning benchmark over 3DSGs.
We propose SCRUB, a task-conditioned 3DSG sparsification method.
We also propose SEEK, a procedure enabling learning-based planners to exploit 3DSG structure.
arXiv Detail & Related papers (2022-07-11T16:51:44Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.