Related papers: StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects

URL: http://arxiv.org/abs/2211.04604v2
Date: Tue, 25 Apr 2023 15:59:47 GMT
Title: StructDiffusion: Language-Guided Creation of Physically-Valid Structures using Unseen Objects
Authors: Weiyu Liu, Yilun Du, Tucker Hermans, Sonia Chernova, Chris Paxton
Abstract summary: We propose StructDiffusion to build physically-valid structures without step-by-step instructions. Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks. We show experiments on held-out objects in both simulation and on real-world tasks.
Score: 35.855172217856726
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Robots operating in human environments must be able to rearrange objects into semantically-meaningful configurations, even if these objects are previously unseen. In this work, we focus on the problem of building physically-valid structures without step-by-step instructions. We propose StructDiffusion, which combines a diffusion model and an object-centric transformer to construct structures given partial-view point clouds and high-level language goals, such as "set the table". Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks using one model. StructDiffusion even improves the success rate of assembling physically-valid structures out of unseen objects by on average 16% over an existing multi-modal transformer model trained on specific structures. We show experiments on held-out objects in both simulation and on real-world rearrangement tasks. Importantly, we show how integrating both a diffusion model and a collision-discriminator model allows for improved generalization over other methods when rearranging previously-unseen objects. For videos and additional results, see our website: https://structdiffusion.github.io/.

Related papers

Transformers Pretrained on Procedural Data Contain Modular Structures for Algorithmic Reasoning [40.84344912259233]
We identify several beneficial forms of procedural data, together with specific algorithmic reasoning skills that improve in small transformers.<n>Our core finding is that different procedural rules instil distinct but complementary inductive structures in the model.<n>Most interestingly, the structures induced by multiple rules can be composed to jointly impart multiple capabilities.
arXiv Detail & Related papers (2025-05-28T12:50:09Z)
Structured Object Language Modeling (SoLM): Native Structured Objects Generation Conforming to Complex Schemas with Self-Supervised Denoising [7.59750288224997]
We frame the problem as a Language Modeling problem (Structured Object Language Modeling) We propose a self-supervised denoising method to train the model from an existing dataset of such objects. Experimental results show that the proposed method matches or outperforms prompt-engineered general-purpose state-of-the-art LLMs.
arXiv Detail & Related papers (2024-11-28T18:16:41Z)
StructRe: Rewriting for Structured Shape Modeling [63.792684115318906]
We present StructRe, a structure rewriting system, as a novel approach to structured shape modeling. Given a 3D object represented by points and components, StructRe can rewrite it upward into more concise structures, or downward into more detailed structures.
arXiv Detail & Related papers (2023-11-29T10:35:00Z)
6-DoF Stability Field via Diffusion Models [9.631625582146537]
We present 6-DoFusion, a generative model capable of generating 3D poses of an object that produces a stable configuration of a given scene. We evaluate our model on different object placement and stacking tasks, demonstrating its ability to construct stable scenes.
arXiv Detail & Related papers (2023-10-26T17:59:12Z)
Structural Concept Learning via Graph Attention for Multi-Level Rearrangement Planning [2.7195102129095003]
We propose a deep learning approach to perform multi-level object rearrangement planning for scenes with structural dependency hierarchies. It is trained on a self-generated simulation data set with intuitive structures and works for unseen scenes with an arbitrary number of objects. We compare our method with a range of classical and model-based baselines to show that our method leverages its scene understanding to achieve better performance, flexibility, and efficiency.
arXiv Detail & Related papers (2023-09-05T19:35:44Z)
Grokking of Hierarchical Structure in Vanilla Transformers [72.45375959893218]
We show that transformer language models can learn to generalize hierarchically after training for extremely long periods. intermediate-depth models generalize better than both very deep and very shallow transformers.
arXiv Detail & Related papers (2023-05-30T04:34:13Z)
StructFormer: Learning Spatial Structure for Language-Guided Semantic Rearrangement of Novel Objects [44.4579949153234]
assistive robots would greatly benefit from the ability to recognize and rearrange objects into semantically meaningful structures. We propose a novel transformer-based neural network, StructFormer, which takes as input a partial-view point cloud of the current object arrangement. We show through rigorous experiments that StructFormer enables a physical robot to rearrange novel objects into semantically meaningful structures.
arXiv Detail & Related papers (2021-10-19T18:13:01Z)
Predicting Stable Configurations for Semantic Placement of Novel Objects [37.18437299513799]
Our goal is to enable robots to repose previously unseen objects according to learned semantic relationships in novel environments. We build our models and training from the ground up to be tightly integrated with our proposed planning algorithm for semantic placement of unknown objects. Our approach enables motion planning for semantic rearrangement of unknown objects in scenes with varying geometry from only RGB-D sensing.
arXiv Detail & Related papers (2021-08-26T23:05:05Z)
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures [57.46093180685175]
We demonstrate a set of modifications to the structure of a Transformer layer, producing a more efficient architecture. We add a convolutional module to complement the self-attention module, decoupling the learning of local and global interactions. We apply the resulting architecture to language representation learning and demonstrate its superior performance compared to BERT models of different scales.
arXiv Detail & Related papers (2021-06-10T15:41:53Z)
Learning to Rearrange Deformable Cables, Fabrics, and Bags with Goal-Conditioned Transporter Networks [36.90218756798642]
Rearranging and manipulating deformable objects such as cables, fabrics, and bags is a long-standing challenge in robotic manipulation. We develop a suite of simulated benchmarks with 1D, 2D, and 3D deformable structures. We propose embedding goal-conditioning into Transporter Networks, a recently proposed model architecture for learning robotic manipulation.
arXiv Detail & Related papers (2020-12-06T22:21:54Z)
Look-into-Object: Self-supervised Structure Modeling for Object Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions. We show the recognition backbone can be substantially enhanced for more robust representation learning. Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.