Learning to Build by Building Your Own Instructions
- URL: http://arxiv.org/abs/2410.01111v1
- Date: Tue, 1 Oct 2024 22:39:58 GMT
- Title: Learning to Build by Building Your Own Instructions
- Authors: Aaron Walsman, Muru Zhang, Adam Fishman, Ali Farhadi, Dieter Fox,
- Abstract summary: We develop a new technique for the recently proposed Break-and-Make problem in LTRON.
An agent must learn to build a previously unseen LEGO assembly using a single interactive session.
We train these models using online imitation learning which allows the model to learn from its own mistakes.
- Score: 56.734927320020496
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Structural understanding of complex visual objects is an important unsolved component of artificial intelligence. To study this, we develop a new technique for the recently proposed Break-and-Make problem in LTRON where an agent must learn to build a previously unseen LEGO assembly using a single interactive session to gather information about its components and their structure. We attack this problem by building an agent that we call \textbf{\ours} that is able to make its own visual instruction book. By disassembling an unseen assembly and periodically saving images of it, the agent is able to create a set of instructions so that it has the information necessary to rebuild it. These instructions form an explicit memory that allows the model to reason about the assembly process one step at a time, avoiding the need for long-term implicit memory. This in turn allows us to train on much larger LEGO assemblies than has been possible in the past. To demonstrate the power of this model, we release a new dataset of procedurally built LEGO vehicles that contain an average of 31 bricks each and require over one hundred steps to disassemble and reassemble. We train these models using online imitation learning which allows the model to learn from its own mistakes. Finally, we also provide some small improvements to LTRON and the Break-and-Make problem that simplify the learning environment and improve usability.
Related papers
- TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly [51.29305265324916]
We propose a class-agnostic tree-transformer framework to predict the sequential assembly actions from input multi-view images.
A major challenge of the sequential brick assembly task is that the step-wise action labels are costly and tedious to obtain in practice.
We mitigate this problem by leveraging synthetic-to-real transfer learning.
arXiv Detail & Related papers (2024-07-22T14:05:27Z) - FollowIR: Evaluating and Teaching Information Retrieval Models to Follow Instructions [71.5977045423177]
We study the use of instructions in Information Retrieval systems.
We introduce our dataset FollowIR, which contains a rigorous instruction evaluation benchmark.
We show that it is possible for IR models to learn to follow complex instructions.
arXiv Detail & Related papers (2024-03-22T14:42:29Z) - Break and Make: Interactive Structural Understanding Using LEGO Bricks [61.01136603613139]
We build a fully interactive 3D simulator that allows learning agents to assemble, disassemble and manipulate LEGO models.
We take a first step towards solving this problem using sequence-to-sequence models.
arXiv Detail & Related papers (2022-07-27T18:33:09Z) - Unveiling Transformers with LEGO: a synthetic reasoning task [23.535488809197787]
We study how the transformer architecture learns to follow a chain of reasoning.
In some data regime the trained transformer finds "shortcut" solutions to follow the chain of reasoning.
We find that one can prevent such shortcut with appropriate architecture modification or careful data preparation.
arXiv Detail & Related papers (2022-06-09T06:30:17Z) - Model LEGO: Creating Models Like Disassembling and Assembling Building Blocks [53.09649785009528]
In this paper, we explore a paradigm that does not require training to obtain new models.
Similar to the birth of CNN inspired by receptive fields in the biological visual system, we propose Model Disassembling and Assembling.
For model assembling, we present the alignment padding strategy and parameter scaling strategy to construct a new model tailored for a specific task.
arXiv Detail & Related papers (2022-03-25T05:27:28Z) - Brick-by-Brick: Combinatorial Construction with Deep Reinforcement
Learning [52.85981207514049]
We introduce a novel formulation, complex construction, which requires a building agent to assemble unit primitives sequentially.
To construct a target object, we provide incomplete knowledge about the desired target (i.e., 2D images) instead of exact and explicit information to the agent.
We demonstrate that the proposed method successfully learns to construct an unseen object conditioned on a single image or multiple views of a target object.
arXiv Detail & Related papers (2021-10-29T01:09:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.