Break and Make: Interactive Structural Understanding Using LEGO Bricks
- URL: http://arxiv.org/abs/2207.13738v1
- Date: Wed, 27 Jul 2022 18:33:09 GMT
- Title: Break and Make: Interactive Structural Understanding Using LEGO Bricks
- Authors: Aaron Walsman, Muru Zhang, Klemen Kotar, Karthik Desingh, Ali Farhadi,
Dieter Fox
- Abstract summary: We build a fully interactive 3D simulator that allows learning agents to assemble, disassemble and manipulate LEGO models.
We take a first step towards solving this problem using sequence-to-sequence models.
- Score: 61.01136603613139
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Visual understanding of geometric structures with complex spatial
relationships is a fundamental component of human intelligence. As children, we
learn how to reason about structure not only from observation, but also by
interacting with the world around us -- by taking things apart and putting them
back together again. The ability to reason about structure and compositionality
allows us to not only build things, but also understand and reverse-engineer
complex systems. In order to advance research in interactive reasoning for
part-based geometric understanding, we propose a challenging new assembly
problem using LEGO bricks that we call Break and Make. In this problem an agent
is given a LEGO model and attempts to understand its structure by interactively
inspecting and disassembling it. After this inspection period, the agent must
then prove its understanding by rebuilding the model from scratch using
low-level action primitives. In order to facilitate research on this problem we
have built LTRON, a fully interactive 3D simulator that allows learning agents
to assemble, disassemble and manipulate LEGO models. We pair this simulator
with a new dataset of fan-made LEGO creations that have been uploaded to the
internet in order to provide complex scenes containing over a thousand unique
brick shapes. We take a first step towards solving this problem using
sequence-to-sequence models that provide guidance for how to make progress on
this challenging problem. Our simulator and data are available at
github.com/aaronwalsman/ltron. Additional training code and PyTorch examples
are available at github.com/aaronwalsman/ltron-torch-eccv22.
Related papers
- TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly [51.29305265324916]
We propose a class-agnostic tree-transformer framework to predict the sequential assembly actions from input multi-view images.
A major challenge of the sequential brick assembly task is that the step-wise action labels are costly and tedious to obtain in practice.
We mitigate this problem by leveraging synthetic-to-real transfer learning.
arXiv Detail & Related papers (2024-07-22T14:05:27Z) - Planning for Complex Non-prehensile Manipulation Among Movable Objects
by Interleaving Multi-Agent Pathfinding and Physics-Based Simulation [23.62057790524675]
Real-world manipulation problems in heavy clutter require robots to reason about potential contacts with objects in the environment.
We focus on pick-and-place style tasks to retrieve a target object from a shelf where some movable' objects must be rearranged in order to solve the task.
In particular, our motivation is to allow the robot to reason over and consider non-prehensile rearrangement actions that lead to complex robot-object and object-object interactions.
arXiv Detail & Related papers (2023-03-23T15:29:27Z) - Self-Supervised Object Goal Navigation with In-Situ Finetuning [110.6053241629366]
This work builds an agent that builds self-supervised models of the world via exploration.
We identify a strong source of self-supervision that can train all components of an ObjectNav agent.
We show that our agent can perform competitively in the real world and simulation.
arXiv Detail & Related papers (2022-12-09T03:41:40Z) - Brick-by-Brick: Combinatorial Construction with Deep Reinforcement
Learning [52.85981207514049]
We introduce a novel formulation, complex construction, which requires a building agent to assemble unit primitives sequentially.
To construct a target object, we provide incomplete knowledge about the desired target (i.e., 2D images) instead of exact and explicit information to the agent.
We demonstrate that the proposed method successfully learns to construct an unseen object conditioned on a single image or multiple views of a target object.
arXiv Detail & Related papers (2021-10-29T01:09:51Z) - Image2Lego: Customized LEGO Set Generation from Images [50.87935634904456]
We implement a system that generates a LEGO brick model from 2D images.
Models are obtained by algorithmic conversion of the 3D voxelized model to bricks.
We generate step-by-step building instructions and animations for LEGO models of objects and human faces.
arXiv Detail & Related papers (2021-08-19T03:42:58Z) - LegoFormer: Transformers for Block-by-Block Multi-view 3D Reconstruction [45.16128577837725]
Most modern deep learning-based multi-view 3D reconstruction techniques use RNNs or fusion modules to combine information from multiple images after encoding them.
We propose LegoFormer, a transformer-based model that unifies object reconstruction under a single framework and parametrizes the reconstructed occupancy grid by its decomposition factors.
arXiv Detail & Related papers (2021-06-23T00:15:08Z) - Text Modular Networks: Learning to Decompose Tasks in the Language of
Existing Models [61.480085460269514]
We propose a framework for building interpretable systems that learn to solve complex tasks by decomposing them into simpler ones solvable by existing models.
We use this framework to build ModularQA, a system that can answer multi-hop reasoning questions by decomposing them into sub-questions answerable by a neural factoid single-span QA model and a symbolic calculator.
arXiv Detail & Related papers (2020-09-01T23:45:42Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.