Related papers: VeriGraph: Scene Graphs for Execution Verifiable Robot Planning

VeriGraph: Scene Graphs for Execution Verifiable Robot Planning

URL: http://arxiv.org/abs/2411.10446v2
Date: Thu, 21 Nov 2024 15:56:48 GMT
Title: VeriGraph: Scene Graphs for Execution Verifiable Robot Planning
Authors: Daniel Ekpo, Mara Levy, Saksham Suri, Chuong Huynh, Abhinav Shrivastava,
Abstract summary: We propose VeriGraph, a framework that integrates vision-language models (VLMs) for robotic planning while verifying action feasibility. VeriGraph employs scene graphs as an intermediate representation, capturing key objects and spatial relationships to improve plan verification and refinement. Our approach significantly enhances task completion rates across diverse manipulation scenarios, outperforming baseline methods by 58% for language-based tasks and 30% for image-based tasks.
Score: 33.8868315479384
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Recent advancements in vision-language models (VLMs) offer potential for robot task planning, but challenges remain due to VLMs' tendency to generate incorrect action sequences. To address these limitations, we propose VeriGraph, a novel framework that integrates VLMs for robotic planning while verifying action feasibility. VeriGraph employs scene graphs as an intermediate representation, capturing key objects and spatial relationships to improve plan verification and refinement. The system generates a scene graph from input images and uses it to iteratively check and correct action sequences generated by an LLM-based task planner, ensuring constraints are respected and actions are executable. Our approach significantly enhances task completion rates across diverse manipulation scenarios, outperforming baseline methods by 58% for language-based tasks and 30% for image-based tasks.

Related papers

Plan-over-Graph: Towards Parallelable LLM Agent Schedule [53.834646147919436]
Large Language Models (LLMs) have demonstrated exceptional abilities in reasoning for task planning. This paper introduces a novel paradigm, plan-over-graph, in which the model first decomposes a real-life textual task into executable subtasks and constructs an abstract task graph. The model then understands this task graph as input and generates a plan for parallel execution.
arXiv Detail & Related papers (2025-02-20T13:47:51Z)
Dynamic Planning for LLM-based Graphical User Interface Automation [48.31532014795368]
We propose a novel approach called Dynamic Planning of Thoughts (D-PoT) for LLM-based GUI agents. D-PoT involves the dynamic adjustment of planning based on the environmental feedback and execution history. Experimental results reveal that the proposed D-PoT significantly surpassed the strong GPT-4V baseline by +12.7%.
arXiv Detail & Related papers (2024-10-01T07:49:24Z)
Can Graph Learning Improve Planning in LLM-based Agents? [61.47027387839096]
Task planning in language agents is emerging as an important research topic alongside the development of large language models (LLMs) In this paper, we explore graph learning-based methods for task planning, a direction that is to the prevalent focus on prompt design. Our interest in graph learning stems from a theoretical discovery: the biases of attention and auto-regressive loss impede LLMs' ability to effectively navigate decision-making on graphs.
arXiv Detail & Related papers (2024-05-29T14:26:24Z)
From Pixels to Graphs: Open-Vocabulary Scene Graph Generation with Vision-Language Models [81.92098140232638]
Scene graph generation (SGG) aims to parse a visual scene into an intermediate graph representation for downstream reasoning tasks. Existing methods struggle to generate scene graphs with novel visual relation concepts. We introduce a new open-vocabulary SGG framework based on sequence generation.
arXiv Detail & Related papers (2024-04-01T04:21:01Z)
MuseGraph: Graph-oriented Instruction Tuning of Large Language Models for Generic Graph Mining [41.19687587548107]
Graph Neural Networks (GNNs) need to be re-trained every time when applied to different graph tasks and datasets. We propose a novel framework MuseGraph, which seamlessly integrates the strengths of GNNs and Large Language Models (LLMs) Our experimental results demonstrate significant improvements in different graph tasks.
arXiv Detail & Related papers (2024-03-02T09:27:32Z)
Planning as In-Painting: A Diffusion-Based Embodied Task Planning Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems. We propose a task-agnostic method named 'planning as in-painting' The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z)
Unsupervised Task Graph Generation from Instructional Video Transcripts [53.54435048879365]
We consider a setting where text transcripts of instructional videos performing a real-world activity are provided. The goal is to identify the key steps relevant to the task as well as the dependency relationship between these key steps. We propose a novel task graph generation approach that combines the reasoning capabilities of instruction-tuned language models along with clustering and ranking components.
arXiv Detail & Related papers (2023-02-17T22:50:08Z)
Sequential Manipulation Planning on Scene Graph [90.28117916077073]
We devise a 3D scene graph representation, contact graph+ (cg+), for efficient sequential task planning. Goal configurations, naturally specified on contact graphs, can be produced by a genetic algorithm with an optimization method. A task plan is then succinct by computing the Graph Editing Distance (GED) between the initial contact graphs and the goal configurations, which generates graph edit operations corresponding to possible robot actions.
arXiv Detail & Related papers (2022-07-10T02:01:33Z)
A Task-Motion Planning Framework Using Iteratively Deepened AND/OR Graph Networks [1.3535770763481902]
We present an approach for Task-Motion Planning (TMP) using Iterative Deepened AND/OR Graph Networks (TMP-IDAN) TMP-IDAN uses an AND/OR graph network based novel abstraction for compactly representing the task-level states and actions. We validate our approach and evaluate its capabilities using a Baxter robot and a state-of-the-art robotics simulator.
arXiv Detail & Related papers (2021-04-04T07:06:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.