Generating Physically Stable and Buildable Brick Structures from Text
- URL: http://arxiv.org/abs/2505.05469v2
- Date: Mon, 30 Jun 2025 18:14:58 GMT
- Title: Generating Physically Stable and Buildable Brick Structures from Text
- Authors: Ava Pun, Kangle Deng, Ruixuan Liu, Deva Ramanan, Changliu Liu, Jun-Yan Zhu,
- Abstract summary: BrickGPT is the first approach for generating physically stable assembly models from text prompts.<n>We release our dataset, StableText2Brick, containing over7,000 3D textured brick structures.
- Score: 63.75381708299733
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We introduce BrickGPT, the first approach for generating physically stable interconnecting brick assembly models from text prompts. To achieve this, we construct a large-scale, physically stable dataset of brick structures, along with their associated captions, and train an autoregressive large language model to predict the next brick to add via next-token prediction. To improve the stability of the resulting designs, we employ an efficient validity check and physics-aware rollback during autoregressive inference, which prunes infeasible token predictions using physics laws and assembly constraints. Our experiments show that BrickGPT produces stable, diverse, and aesthetically pleasing brick structures that align closely with the input text prompts. We also develop a text-based brick texturing method to generate colored and textured designs. We show that our designs can be assembled manually by humans and automatically by robotic arms. We release our new dataset, StableText2Brick, containing over 47,000 brick structures of over 28,000 unique 3D objects accompanied by detailed captions, along with our code and models at the project website: https://avalovelace1.github.io/BrickGPT/.
Related papers
- Cube: A Roblox View of 3D Intelligence [67.43543266278154]
Foundation models trained on vast amounts of data have demonstrated remarkable reasoning and generation capabilities.<n>We show how our tokenization scheme can be used in applications for text-to-shape generation, shape-to-text generation and text-to-scene generation.<n>We conclude with a discussion outlining our path to building a fully unified foundation model for 3D intelligence.
arXiv Detail & Related papers (2025-03-19T17:52:17Z) - TreeSBA: Tree-Transformer for Self-Supervised Sequential Brick Assembly [51.29305265324916]
We propose a class-agnostic tree-transformer framework to predict the sequential assembly actions from input multi-view images.
A major challenge of the sequential brick assembly task is that the step-wise action labels are costly and tedious to obtain in practice.
We mitigate this problem by leveraging synthetic-to-real transfer learning.
arXiv Detail & Related papers (2024-07-22T14:05:27Z) - DressCode: Autoregressively Sewing and Generating Garments from Text Guidance [61.48120090970027]
DressCode aims to democratize design for novices and offer immense potential in fashion design, virtual try-on, and digital human creation.
We first introduce SewingGPT, a GPT-based architecture integrating cross-attention with text-conditioned embedding to generate sewing patterns.
We then tailor a pre-trained Stable Diffusion to generate tile-based Physically-based Rendering (PBR) textures for the garments.
arXiv Detail & Related papers (2024-01-29T16:24:21Z) - Instruct-SCTG: Guiding Sequential Controlled Text Generation through
Instructions [42.67608830386934]
Instruct-SCTG is a sequential framework that harnesses instruction-tuned language models to generate structurally coherent text.
Our framework generates articles in a section-by-section manner, aligned with the desired human structure using natural language instructions.
arXiv Detail & Related papers (2023-12-19T16:20:49Z) - TextDiffuser-2: Unleashing the Power of Language Models for Text
Rendering [118.30923824681642]
TextDiffuser-2 aims to unleash the power of language models for text rendering.
We utilize the language model within the diffusion model to encode the position and texts at the line level.
We conduct extensive experiments and incorporate user studies involving human participants as well as GPT-4V.
arXiv Detail & Related papers (2023-11-28T04:02:40Z) - A Lightweight and Transferable Design for Robust LEGO Manipulation [10.982854061044339]
This paper investigates safe and efficient robotic Lego manipulation.
An end-of-arm tool (EOAT) is designed, which reduces the problem dimension and allows large industrial robots to manipulate small Lego bricks.
Experiments demonstrate that the EOAT can reliably manipulate Lego bricks and the learning framework can effectively and safely improve the manipulation performance to a 100% success rate.
arXiv Detail & Related papers (2023-09-05T16:11:37Z) - Generating Faithful Text From a Knowledge Graph with Noisy Reference
Text [26.6775578332187]
We develop a KG-to-text generation model that can generate faithful natural-language text from a given graph.
Our framework incorporates two core ideas: Firstly, we utilize contrastive learning to enhance the model's ability to differentiate between faithful and hallucinated information in the text.
Secondly, we empower the decoder to control the level of hallucination in the generated text by employing a controllable text generation technique.
arXiv Detail & Related papers (2023-08-12T07:12:45Z) - Budget-Aware Sequential Brick Assembly with Efficient Constraint Satisfaction [63.672314717599285]
We tackle the problem of sequential brick assembly with LEGO bricks to create 3D structures.
In particular, the number of assemblable structures increases exponentially as the number of bricks used increases.
We propose a new method to predict the scores of the next brick position by employing a U-shaped sparse 3D convolutional neural network.
arXiv Detail & Related papers (2022-10-03T15:35:08Z) - Break and Make: Interactive Structural Understanding Using LEGO Bricks [61.01136603613139]
We build a fully interactive 3D simulator that allows learning agents to assemble, disassemble and manipulate LEGO models.
We take a first step towards solving this problem using sequence-to-sequence models.
arXiv Detail & Related papers (2022-07-27T18:33:09Z) - Image2Lego: Customized LEGO Set Generation from Images [50.87935634904456]
We implement a system that generates a LEGO brick model from 2D images.
Models are obtained by algorithmic conversion of the 3D voxelized model to bricks.
We generate step-by-step building instructions and animations for LEGO models of objects and human faces.
arXiv Detail & Related papers (2021-08-19T03:42:58Z) - Lattice-BERT: Leveraging Multi-Granularity Representations in Chinese
Pre-trained Language Models [62.41139712595334]
We propose a novel pre-training paradigm for Chinese -- Lattice-BERT.
We construct a lattice graph from the characters and words in a sentence and feed all these text units into transformers.
We show that our model can bring an average increase of 1.5% under the 12-layer setting.
arXiv Detail & Related papers (2021-04-15T02:36:49Z) - Building LEGO Using Deep Generative Models of Graphs [22.926487008829668]
We advocate LEGO as a platform for developing generative models of sequential assembly.
We develop a generative model based on graph-structured neural networks that can learn from human-built structures and produce visually compelling designs.
arXiv Detail & Related papers (2020-12-21T18:24:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.