TASKOGRAPHY: Evaluating robot task planning over large 3D scene graphs
- URL: http://arxiv.org/abs/2207.05006v1
- Date: Mon, 11 Jul 2022 16:51:44 GMT
- Title: TASKOGRAPHY: Evaluating robot task planning over large 3D scene graphs
- Authors: Christopher Agia, Krishna Murthy Jatavallabhula, Mohamed Khodeir,
Ondrej Miksik, Vibhav Vineet, Mustafa Mukadam, Liam Paull, Florian Shkurti
- Abstract summary: TASKOGRAPHY is the first large-scale robotic task planning benchmark over 3DSGs.
We propose SCRUB, a task-conditioned 3DSG sparsification method.
We also propose SEEK, a procedure enabling learning-based planners to exploit 3DSG structure.
- Score: 33.25317860393983
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: 3D scene graphs (3DSGs) are an emerging description; unifying symbolic,
topological, and metric scene representations. However, typical 3DSGs contain
hundreds of objects and symbols even for small environments; rendering task
planning on the full graph impractical. We construct TASKOGRAPHY, the first
large-scale robotic task planning benchmark over 3DSGs. While most benchmarking
efforts in this area focus on vision-based planning, we systematically study
symbolic planning, to decouple planning performance from visual representation
learning. We observe that, among existing methods, neither classical nor
learning-based planners are capable of real-time planning over full 3DSGs.
Enabling real-time planning demands progress on both (a) sparsifying 3DSGs for
tractable planning and (b) designing planners that better exploit 3DSG
hierarchies. Towards the former goal, we propose SCRUB, a task-conditioned 3DSG
sparsification method; enabling classical planners to match and in some cases
surpass state-of-the-art learning-based planners. Towards the latter goal, we
propose SEEK, a procedure enabling learning-based planners to exploit 3DSG
structure, reducing the number of replanning queries required by current best
approaches by an order of magnitude. We will open-source all code and baselines
to spur further research along the intersections of robot task planning,
learning and 3DSGs.
Related papers
- Socratic Planner: Inquiry-Based Zero-Shot Planning for Embodied Instruction Following [17.608330952846075]
Embodied Instruction Following (EIF) is the task of executing natural language instructions by navigating and interacting with objects in 3D environments.
One of the primary challenges in EIF is compositional task planning, which is often addressed with supervised or in-context learning with labeled data.
We introduce the Socratic Planner, the first zero-shot planning method that infers without the need for any training data.
arXiv Detail & Related papers (2024-04-21T08:10:20Z) - Planning as In-Painting: A Diffusion-Based Embodied Task Planning
Framework for Environments under Uncertainty [56.30846158280031]
Task planning for embodied AI has been one of the most challenging problems.
We propose a task-agnostic method named 'planning as in-painting'
The proposed framework achieves promising performances in various embodied AI tasks.
arXiv Detail & Related papers (2023-12-02T10:07:17Z) - ConceptGraphs: Open-Vocabulary 3D Scene Graphs for Perception and
Planning [125.90002884194838]
ConceptGraphs is an open-vocabulary graph-structured representation for 3D scenes.
It is built by leveraging 2D foundation models and fusing their output to 3D by multi-view association.
We demonstrate the utility of this representation through a number of downstream planning tasks.
arXiv Detail & Related papers (2023-09-28T17:53:38Z) - SayPlan: Grounding Large Language Models using 3D Scene Graphs for
Scalable Robot Task Planning [15.346150968195015]
We introduce SayPlan, a scalable approach to large-scale task planning for robotics using 3D scene graph (3DSG) representations.
We evaluate our approach on two large-scale environments spanning up to 3 floors and 36 rooms with 140 assets and objects.
arXiv Detail & Related papers (2023-07-12T12:37:55Z) - Embodied Task Planning with Large Language Models [86.63533340293361]
We propose a TAsk Planing Agent (TaPA) in embodied tasks for grounded planning with physical scene constraint.
During inference, we discover the objects in the scene by extending open-vocabulary object detectors to multi-view RGB images collected in different achievable locations.
Experimental results show that the generated plan from our TaPA framework can achieve higher success rate than LLaVA and GPT-3.5 by a sizable margin.
arXiv Detail & Related papers (2023-07-04T17:58:25Z) - Learning to Reason over Scene Graphs: A Case Study of Finetuning GPT-2
into a Robot Language Model for Grounded Task Planning [45.51792981370957]
We investigate the applicability of a smaller class of large language models (LLMs) in robotic task planning by learning to decompose tasks into subgoal specifications for a planner to execute sequentially.
Our method grounds the input of the LLM on the domain that is represented as a scene graph, enabling it to translate human requests into executable robot plans.
Our findings suggest that the knowledge stored in an LLM can be effectively grounded to perform long-horizon task planning, demonstrating the promising potential for the future application of neuro-symbolic planning methods in robotics.
arXiv Detail & Related papers (2023-05-12T18:14:32Z) - A Framework for Neurosymbolic Robot Action Planning using Large Language Models [3.0501524254444767]
We present a framework aimed at bridging the gap between symbolic task planning and machine learning approaches.
The rationale is training Large Language Models (LLMs) into a neurosymbolic task planner compatible with the Planning Domain Definition Language (PDDL)
Preliminary results in selected domains show that our method can: (i) solve 95.5% of problems in a test data set of 1,000 samples; (ii) produce plans up to 13.5% shorter than a traditional symbolic planner; (iii) reduce average overall waiting times for a plan availability by up to 61.4%.
arXiv Detail & Related papers (2023-03-01T11:54:22Z) - Sequential Manipulation Planning on Scene Graph [90.28117916077073]
We devise a 3D scene graph representation, contact graph+ (cg+), for efficient sequential task planning.
Goal configurations, naturally specified on contact graphs, can be produced by a genetic algorithm with an optimization method.
A task plan is then succinct by computing the Graph Editing Distance (GED) between the initial contact graphs and the goal configurations, which generates graph edit operations corresponding to possible robot actions.
arXiv Detail & Related papers (2022-07-10T02:01:33Z) - Enabling Visual Action Planning for Object Manipulation through Latent
Space Roadmap [72.01609575400498]
We present a framework for visual action planning of complex manipulation tasks with high-dimensional state spaces.
We propose a Latent Space Roadmap (LSR) for task planning, a graph-based structure capturing globally the system dynamics in a low-dimensional latent space.
We present a thorough investigation of our framework on two simulated box stacking tasks and a folding task executed on a real robot.
arXiv Detail & Related papers (2021-03-03T17:48:26Z) - Planning with Learned Object Importance in Large Problem Instances using
Graph Neural Networks [28.488201307961624]
Real-world planning problems often involve hundreds or even thousands of objects.
We propose a graph neural network architecture for predicting object importance in a single inference pass.
Our approach treats the planner and transition model as black boxes, and can be used with any off-the-shelf planner.
arXiv Detail & Related papers (2020-09-11T18:55:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.