StructFormer: Learning Spatial Structure for Language-Guided Semantic
Rearrangement of Novel Objects
- URL: http://arxiv.org/abs/2110.10189v1
- Date: Tue, 19 Oct 2021 18:13:01 GMT
- Title: StructFormer: Learning Spatial Structure for Language-Guided Semantic
Rearrangement of Novel Objects
- Authors: Weiyu Liu, Chris Paxton, Tucker Hermans, Dieter Fox
- Abstract summary: assistive robots would greatly benefit from the ability to recognize and rearrange objects into semantically meaningful structures.
We propose a novel transformer-based neural network, StructFormer, which takes as input a partial-view point cloud of the current object arrangement.
We show through rigorous experiments that StructFormer enables a physical robot to rearrange novel objects into semantically meaningful structures.
- Score: 44.4579949153234
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Geometric organization of objects into semantically meaningful arrangements
pervades the built world. As such, assistive robots operating in warehouses,
offices, and homes would greatly benefit from the ability to recognize and
rearrange objects into these semantically meaningful structures. To be useful,
these robots must contend with previously unseen objects and receive
instructions without significant programming. While previous works have
examined recognizing pairwise semantic relations and sequential manipulation to
change these simple relations none have shown the ability to arrange objects
into complex structures such as circles or table settings. To address this
problem we propose a novel transformer-based neural network, StructFormer,
which takes as input a partial-view point cloud of the current object
arrangement and a structured language command encoding the desired object
configuration. We show through rigorous experiments that StructFormer enables a
physical robot to rearrange novel objects into semantically meaningful
structures with multi-object relational constraints inferred from the language
command.
Related papers
- Knolling Bot: Learning Robotic Object Arrangement from Tidy Demonstrations [11.873522421121173]
This paper introduces a self-supervised learning framework that allows robots to understand and replicate the concept of tidiness.
We leverage a transformer neural network to predict the placement of subsequent objects.
Our method not only trains a generalizable concept of tidiness, but it can also incorporate human preferences to generate customized tidy tables.
arXiv Detail & Related papers (2023-10-06T20:13:07Z) - Structural Concept Learning via Graph Attention for Multi-Level
Rearrangement Planning [2.7195102129095003]
We propose a deep learning approach to perform multi-level object rearrangement planning for scenes with structural dependency hierarchies.
It is trained on a self-generated simulation data set with intuitive structures and works for unseen scenes with an arbitrary number of objects.
We compare our method with a range of classical and model-based baselines to show that our method leverages its scene understanding to achieve better performance, flexibility, and efficiency.
arXiv Detail & Related papers (2023-09-05T19:35:44Z) - Grokking of Hierarchical Structure in Vanilla Transformers [72.45375959893218]
We show that transformer language models can learn to generalize hierarchically after training for extremely long periods.
intermediate-depth models generalize better than both very deep and very shallow transformers.
arXiv Detail & Related papers (2023-05-30T04:34:13Z) - Neural Constraint Satisfaction: Hierarchical Abstraction for
Combinatorial Generalization in Object Rearrangement [75.9289887536165]
We present a hierarchical abstraction approach to uncover underlying entities.
We show how to learn a correspondence between intervening on states of entities in the agent's model and acting on objects in the environment.
We use this correspondence to develop a method for control that generalizes to different numbers and configurations of objects.
arXiv Detail & Related papers (2023-03-20T18:19:36Z) - StructDiffusion: Language-Guided Creation of Physically-Valid Structures
using Unseen Objects [35.855172217856726]
We propose StructDiffusion to build physically-valid structures without step-by-step instructions.
Our method can perform multiple challenging language-conditioned multi-step 3D planning tasks.
We show experiments on held-out objects in both simulation and on real-world tasks.
arXiv Detail & Related papers (2022-11-08T23:04:49Z) - Forming Trees with Treeformers [3.8073142980733]
Many state-of-the-art neural networks models such as Transformers have no explicit hierarchical structure in its architecture.
We introduce Treeformer, a general-purpose encoder module inspired by the CKY algorithm.
Our experiments demonstrate the benefits of incorporating hierarchical structure into the Transformer.
arXiv Detail & Related papers (2022-07-14T14:39:30Z) - Identifying concept libraries from language about object structure [56.83719358616503]
We leverage natural language descriptions for a diverse set of 2K procedurally generated objects to identify the parts people use.
We formalize our problem as search over a space of program libraries that contain different part concepts.
By combining naturalistic language at scale with structured program representations, we discover a fundamental information-theoretic tradeoff governing the part concepts people name.
arXiv Detail & Related papers (2022-05-11T17:49:25Z) - Unsupervised Distillation of Syntactic Information from Contextualized
Word Representations [62.230491683411536]
We tackle the task of unsupervised disentanglement between semantics and structure in neural language representations.
To this end, we automatically generate groups of sentences which are structurally similar but semantically different.
We demonstrate that our transformation clusters vectors in space by structural properties, rather than by lexical semantics.
arXiv Detail & Related papers (2020-10-11T15:13:18Z) - Look-into-Object: Self-supervised Structure Modeling for Object
Recognition [71.68524003173219]
We propose to "look into object" (explicitly yet intrinsically model the object structure) through incorporating self-supervisions.
We show the recognition backbone can be substantially enhanced for more robust representation learning.
Our approach achieves large performance gain on a number of benchmarks, including generic object recognition (ImageNet) and fine-grained object recognition tasks (CUB, Cars, Aircraft)
arXiv Detail & Related papers (2020-03-31T12:22:51Z) - Deep compositional robotic planners that follow natural language
commands [21.481360281719006]
We show how a sampling-based robotic planner can be augmented to learn to understand a sequence of natural language commands.
Our approach combines a deep network structured according to the parse of a complex command that includes objects, verbs, spatial relations, and attributes.
arXiv Detail & Related papers (2020-02-12T19:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.