StructDiffusion: Language-Guided Creation of Physically-Valid Structures
using Unseen Objects
- URL: http://arxiv.org/abs/2211.04604v2
- Date: Tue, 25 Apr 2023 15:59:47 GMT
- Title: StructDiffusion: Language-Guided Creation of Physically-Valid Structures
using Unseen Objects
- Authors: Weiyu Liu, Yilun Du, Tucker Hermans, Sonia Chernova, Chris Paxton
- Abstract summary: We propose StructDiffusion to build physically-valid structures without step-by-step instructions.
Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks.
We show experiments on held-out objects in both simulation and on real-world tasks.
- Score: 35.855172217856726
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Robots operating in human environments must be able to rearrange objects into
semantically-meaningful configurations, even if these objects are previously
unseen. In this work, we focus on the problem of building physically-valid
structures without step-by-step instructions. We propose StructDiffusion, which
combines a diffusion model and an object-centric transformer to construct
structures given partial-view point clouds and high-level language goals, such
as "set the table". Our method can perform multiple challenging
language-conditioned multi-step 3D planning tasks using one model.
StructDiffusion even improves the success rate of assembling physically-valid
structures out of unseen objects by on average 16% over an existing multi-modal
transformer model trained on specific structures. We show experiments on
held-out objects in both simulation and on real-world rearrangement tasks.
Importantly, we show how integrating both a diffusion model and a
collision-discriminator model allows for improved generalization over other
methods when rearranging previously-unseen objects. For videos and additional
results, see our website: https://structdiffusion.github.io/.
Related papers
- StructRe: Rewriting for Structured Shape Modeling [63.792684115318906]
We present StructRe, a structure rewriting system, as a novel approach to structured shape modeling.
Given a 3D object represented by points and components, StructRe can rewrite it upward into more concise structures, or downward into more detailed structures.
arXiv Detail & Related papers (2023-11-29T10:35:00Z) - 6-DoF Stability Field via Diffusion Models [9.631625582146537]
We present 6-DoFusion, a generative model capable of generating 3D poses of an object that produces a stable configuration of a given scene.
We evaluate our model on different object placement and stacking tasks, demonstrating its ability to construct stable scenes.
arXiv Detail & Related papers (2023-10-26T17:59:12Z) - Structural Concept Learning via Graph Attention for Multi-Level
Rearrangement Planning [2.7195102129095003]
We propose a deep learning approach to perform multi-level object rearrangement planning for scenes with structural dependency hierarchies.
It is trained on a self-generated simulation data set with intuitive structures and works for unseen scenes with an arbitrary number of objects.
We compare our method with a range of classical and model-based baselines to show that our method leverages its scene understanding to achieve better performance, flexibility, and efficiency.
arXiv Detail & Related papers (2023-09-05T19:35:44Z) - Grokking of Hierarchical Structure in Vanilla Transformers [72.45375959893218]
We show that transformer language models can learn to generalize hierarchically after training for extremely long periods.
intermediate-depth models generalize better than both very deep and very shallow transformers.
arXiv Detail & Related papers (2023-05-30T04:34:13Z) - StructFormer: Learning Spatial Structure for Language-Guided Semantic
Rearrangement of Novel Objects [44.4579949153234]
assistive robots would greatly benefit from the ability to recognize and rearrange objects into semantically meaningful structures.
We propose a novel transformer-based neural network, StructFormer, which takes as input a partial-view point cloud of the current object arrangement.
We show through rigorous experiments that StructFormer enables a physical robot to rearrange novel objects into semantically meaningful structures.
arXiv Detail & Related papers (2021-10-19T18:13:01Z) - Predicting Stable Configurations for Semantic Placement of Novel Objects [37.18437299513799]
Our goal is to enable robots to repose previously unseen objects according to learned semantic relationships in novel environments.
We build our models and training from the ground up to be tightly integrated with our proposed planning algorithm for semantic placement of unknown objects.
Our approach enables motion planning for semantic rearrangement of unknown objects in scenes with varying geometry from only RGB-D sensing.
arXiv Detail & Related papers (2021-08-26T23:05:05Z) - GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture.
We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions.
We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z) - Learning to Rearrange Deformable Cables, Fabrics, and Bags with
Goal-Conditioned Transporter Networks [36.90218756798642]
Rearranging and manipulating deformable objects such as cables, fabrics, and bags is a long-standing challenge in robotic manipulation.
We develop a suite of simulated benchmarks with 1D, 2D, and 3D deformable structures.
We propose embedding goal-conditioning into Transporter Networks, a recently proposed model architecture for learning robotic manipulation.
arXiv Detail & Related papers (2020-12-06T22:21:54Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.