Tokenizing Buildings: A Transformer for Layout Synthesis
- URL: http://arxiv.org/abs/2512.04832v1
- Date: Thu, 04 Dec 2025 14:16:09 GMT
- Title: Tokenizing Buildings: A Transformer for Layout Synthesis
- Authors: Manuel Ladron de Guevara, Jinmo Rhee, Ardavan Bidgoli, Vaidas Razgaitis, Michael Bergin,
- Abstract summary: Small Building Model (SBM) is a Transformer-based architecture for layout synthesis in Building Information Modeling scenes.<n>We address the question of how to tokenize buildings by unifying heterogeneous feature sets of architectural elements into sequences.<n>We train a single Transformer backbone in two modes: an encoder-only pathway that yields high-fidelity room embeddings, and an encoder-decoder pipeline for autoregressive prediction of room entities.
- Score: 0.6524460254566904
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: We introduce Small Building Model (SBM), a Transformer-based architecture for layout synthesis in Building Information Modeling (BIM) scenes. We address the question of how to tokenize buildings by unifying heterogeneous feature sets of architectural elements into sequences while preserving compositional structure. Such feature sets are represented as a sparse attribute-feature matrix that captures room properties. We then design a unified embedding module that learns joint representations of categorical and possibly correlated continuous feature groups. Lastly, we train a single Transformer backbone in two modes: an encoder-only pathway that yields high-fidelity room embeddings, and an encoder-decoder pipeline for autoregressive prediction of room entities, referred to as Data-Driven Entity Prediction (DDEP). Experiments across retrieval and generative layout synthesis show that SBM learns compact room embeddings that reliably cluster by type and topology, enabling strong semantic retrieval. In DDEP mode, SBM produces functionally sound layouts, with fewer collisions and boundary violations and improved navigability.
Related papers
- StdGEN++: A Comprehensive System for Semantic-Decomposed 3D Character Generation [57.06461272772509]
StdGEN++ is a novel and comprehensive system for generating high-fidelity, semantically decomposed 3D characters from diverse inputs.<n>It achieves state-of-the-art performance, significantly outperforming existing methods in geometric accuracy and semantic disentanglement.<n>The resulting structural independence unlocks advanced downstream capabilities, including non-destructive editing, physics-compliant animation, and gaze tracking.
arXiv Detail & Related papers (2026-01-12T15:41:27Z) - VSA:Visual-Structural Alignment for UI-to-Code [29.15071743239679]
We propose bfVSA (VSA), a multi-stage paradigm designed to synthesize organized assets through visual-text alignment.<n>Our framework yields a substantial improvement in code modularity and architectural consistency over state-of-the-art benchmarks.
arXiv Detail & Related papers (2025-12-23T03:55:45Z) - RoomEditor++: A Parameter-Sharing Diffusion Architecture for High-Fidelity Furniture Synthesis [89.26382925677301]
Virtual furniture synthesis holds substantial promise for home design and e-commerce applications.<n>RoomEditor++ is a versatile diffusion-based architecture featuring a parameter-sharing dual diffusion backbone.<n>RoomEditor++ is superior over state-of-the-art approaches in terms of quantitative metrics, qualitative assessments, and human preference studies.
arXiv Detail & Related papers (2025-12-19T13:39:43Z) - From Feature Interaction to Feature Generation: A Generative Paradigm of CTR Prediction Models [81.43473418572567]
Click-Through Rate (CTR) prediction is a core task in recommendation systems.<n>We propose a novel generative framework to address embedding dimensional collapse and information redundancy.<n>We show that SFG consistently mitigates embedding collapse and reduces information redundancy, while yielding substantial performance gains.
arXiv Detail & Related papers (2025-12-16T03:17:18Z) - CountFormer: A Transformer Framework for Learning Visual Repetition and Structure in Class-Agnostic Object Counting [0.0]
Humans can effortlessly count diverse objects by perceiving visual repetition and structural relationships rather than relying on class identity.<n>In this work, we introduce CountFormer, a transformer-based framework that learns to recognize repetition and structural coherence for class-agnostic object counting.
arXiv Detail & Related papers (2025-10-27T19:16:02Z) - Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes [60.92139345612904]
We present Light-SQ, a novel superquadric-based optimization framework.<n>We propose a block-regrow-fill strategy guided by structure-aware volumetric decomposition.<n>Experiments demonstrate that Light-SQ enables efficient, high-fidelity, and editable shape abstraction with superquadrics.
arXiv Detail & Related papers (2025-09-29T16:18:32Z) - UCS: A Universal Model for Curvilinear Structure Segmentation [11.10994320036562]
Curvilinear structure segmentation (CSS) is vital in various domains, including medical imaging, landscape analysis, industrial surface inspection, and plant analysis.<n>We propose the Universal Curvilinear structure (textitUCS) model, which adapts SAM to CSS tasks while enhancing its generalization.<n>textitUCS demonstrates state-of-the-art generalization and open-set segmentation performance across medical, engineering, natural, and plant imagery.
arXiv Detail & Related papers (2025-04-05T03:05:04Z) - Learning and Evaluating Hierarchical Feature Representations [3.770103075126785]
We propose a novel framework, Hierarchical Composition of Orthogonal Subspaces (Hier-COS)<n>Hier-COS learns to map deep feature embeddings into a vector space that is, by design, consistent with the structure of a given taxonomy tree.<n>We demonstrate that Hier-COS achieves state-of-the-art hierarchical performance across all the datasets while simultaneously beating top-1 accuracy in all but one case.
arXiv Detail & Related papers (2025-03-10T20:59:41Z) - P-MSDiff: Parallel Multi-Scale Diffusion for Remote Sensing Image Segmentation [8.46409964236009]
Diffusion models and multi-scale features are essential components in semantic segmentation tasks.
We propose a new model for semantic segmentation known as the diffusion model with parallel multi-scale branches.
Our model demonstrates superior performance based on the J1 metric on both the UAVid and Vaihingen Building datasets.
arXiv Detail & Related papers (2024-05-30T19:40:08Z) - Unite-Divide-Unite: Joint Boosting Trunk and Structure for High-accuracy
Dichotomous Image Segmentation [48.995367430746086]
High-accuracy Dichotomous Image rendering (DIS) aims to pinpoint category-agnostic foreground objects from natural scenes.
We introduce a novel Unite-Divide-Unite Network (UDUN) that restructures and bipartitely arranges complementary features to boost the effectiveness of trunk and structure identification.
Using 1024*1024 input, our model enables real-time inference at 65.3 fps with ResNet-18.
arXiv Detail & Related papers (2023-07-26T09:04:35Z) - Disentanglement via Latent Quantization [60.37109712033694]
In this work, we construct an inductive bias towards encoding to and decoding from an organized latent space.
We demonstrate the broad applicability of this approach by adding it to both basic data-re (vanilla autoencoder) and latent-reconstructing (InfoGAN) generative models.
arXiv Detail & Related papers (2023-05-28T06:30:29Z) - Inducing Transformer's Compositional Generalization Ability via
Auxiliary Sequence Prediction Tasks [86.10875837475783]
Systematic compositionality is an essential mechanism in human language, allowing the recombination of known parts to create novel expressions.
Existing neural models have been shown to lack this basic ability in learning symbolic structures.
We propose two auxiliary sequence prediction tasks that track the progress of function and argument semantics.
arXiv Detail & Related papers (2021-09-30T16:41:19Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.