FloorplanVLM: A Vision-Language Model for Floorplan Vectorization
- URL: http://arxiv.org/abs/2602.06507v1
- Date: Fri, 06 Feb 2026 08:57:52 GMT
- Title: FloorplanVLM: A Vision-Language Model for Floorplan Vectorization
- Authors: Yuanqing Liu, Ziming Yang, Yulong Li, Yue Yang,
- Abstract summary: We present FloorplanVLM, a unified framework that reformulates vectorization as an image-conditioned sequence modeling task.<n>This 'pixels-to-sequence' paradigm enables the precise and holistic constraint satisfaction of complex grounding, such as walls and curved arcs.
- Score: 15.691267151619442
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Converting raster floorplans into engineering-grade vector graphics is challenging due to complex topology and strict geometric constraints. To address this, we present FloorplanVLM, a unified framework that reformulates floorplan vectorization as an image-conditioned sequence modeling task. Unlike pixel-based methods that rely on fragile heuristics or query-based transformers that generate fragmented rooms, our model directly outputs structured JSON sequences representing the global topology. This 'pixels-to-sequence' paradigm enables the precise and holistic constraint satisfaction of complex geometries, such as slanted walls and curved arcs. To support this data-hungry approach, we introduce a scalable data engine: we construct a large-scale dataset (Floorplan-2M) and a high-fidelity subset (Floorplan-HQ-300K) to balance geometric diversity and pixel-level precision. We then employ a progressive training strategy, using Supervised Fine-Tuning (SFT) for structural grounding and quality annealing, followed by Group Relative Policy Optimization (GRPO) for strict geometric alignment. To standardize evaluation on complex layouts, we establish and open-source FPBench-2K. Evaluated on this rigorous benchmark, FloorplanVLM demonstrates exceptional structural validity, achieving $\textbf{92.52%}$ external-wall IoU and robust generalization across non-Manhattan architectures.
Related papers
- StepVAR: Structure-Texture Guided Pruning for Visual Autoregressive Models [98.72926158261937]
We propose a training-free token pruning framework for Visual AutoRegressive models.<n>We employ a lightweight high-pass filter to capture local texture details, while leveraging Principal Component Analysis (PCA) to preserve global structural information.<n>To maintain valid next-scale prediction under sparse tokens, we introduce a nearest neighbor feature propagation strategy.
arXiv Detail & Related papers (2026-03-02T11:35:05Z) - Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction [21.366278792227785]
We propose Raster2Seq, framing floorplan reconstruction as a sequence-to-sequence task.<n>An autoregressive decoder learns to predict the next corner conditioned on image features and previously generated corners.<n>Our method achieves state-of-the-art performance on standard benchmarks.
arXiv Detail & Related papers (2026-02-09T18:58:46Z) - TLC-Plan: A Two-Level Codebook Based Network for End-to-End Vector Floorplan Generation [19.063941053235567]
We propose TLC-Plan, a hierarchical generative model that directly synthesizes vectors from spatial input boundaries.<n>TLC-Plan employs a two-level VQ-VAE to encode global layouts as semantically labeled room bounding boxes.<n>Experiments show state-of-the-art performance on RPLAN dataset and leading results on LI dataset.
arXiv Detail & Related papers (2026-02-06T15:36:50Z) - Enhancing Floor Plan Recognition: A Hybrid Mix-Transformer and U-Net Approach for Precise Wall Segmentation [0.0]
This study introduces MitUNet, a hybrid neural network combining a Mix-Transformer encoder and a U-Net decoder.<n>Our approach achieves a balance between precision and recall, ensuring accurate boundary recovery.<n> Experiments on the CubiCasa5k dataset and a proprietary regional dataset demonstrate MitUNet's superiority in generating structurally correct masks.
arXiv Detail & Related papers (2025-12-02T04:47:53Z) - Light-SQ: Structure-aware Shape Abstraction with Superquadrics for Generated Meshes [60.92139345612904]
We present Light-SQ, a novel superquadric-based optimization framework.<n>We propose a block-regrow-fill strategy guided by structure-aware volumetric decomposition.<n>Experiments demonstrate that Light-SQ enables efficient, high-fidelity, and editable shape abstraction with superquadrics.
arXiv Detail & Related papers (2025-09-29T16:18:32Z) - CAGE: Continuity-Aware edGE Network Unlocks Robust Floorplan Reconstruction [24.09888364478496]
We present CAGE, a robust framework for reconstructing vector floorplans directly from point-cloud density maps.<n>CAGE achieves state-of-the-art performance, with F1 scores of 99.1% (rooms), 91.7% (corners), and 89.3% (angles)
arXiv Detail & Related papers (2025-09-18T22:10:37Z) - GSDiff: Synthesizing Vector Floorplans via Geometry-enhanced Structural Graph Generation [3.78198085695976]
architectural floorplan design is vital for housing and interior design, offering a faster, cost-effective alternative to manual sketches by architects.<n>Existing methods, including rule-based and learning-based approaches, face challenges in design and constrained generation with extensive post-processing.<n>We propose a novel generative framework for vector design via structural graph generation, called GSDiff.
arXiv Detail & Related papers (2024-08-29T04:40:31Z) - 3D Geometric Shape Assembly via Efficient Point Cloud Matching [59.241448711254485]
We introduce Proxy Match Transform (PMT), an approximate high-order feature transform layer that enables reliable matching between mating surfaces of parts.
Building upon PMT, we introduce a new framework, dubbed Proxy Match TransformeR (PMTR), for the geometric assembly task.
We evaluate the proposed PMTR on the large-scale 3D geometric shape assembly benchmark dataset of Breaking Bad.
arXiv Detail & Related papers (2024-07-15T08:50:02Z) - 360 Layout Estimation via Orthogonal Planes Disentanglement and Multi-view Geometric Consistency Perception [56.84921040837699]
Existing panoramic layout estimation solutions tend to recover room boundaries from a vertically compressed sequence, yielding imprecise results.
We propose an orthogonal plane disentanglement network (termed DOPNet) to distinguish ambiguous semantics.
We also present an unsupervised adaptation technique tailored for horizon-depth and ratio representations.
Our solution outperforms other SoTA models on both monocular layout estimation and multi-view layout estimation tasks.
arXiv Detail & Related papers (2023-12-26T12:16:03Z) - DepGraph: Towards Any Structural Pruning [68.40343338847664]
We study general structural pruning of arbitrary architecture like CNNs, RNNs, GNNs and Transformers.
We propose a general and fully automatic method, emphDependency Graph (DepGraph), to explicitly model the dependency between layers and comprehensively group parameters for pruning.
In this work, we extensively evaluate our method on several architectures and tasks, including ResNe(X)t, DenseNet, MobileNet and Vision transformer for images, GAT for graph, DGCNN for 3D point cloud, alongside LSTM for language, and demonstrate that, even with a
arXiv Detail & Related papers (2023-01-30T14:02:33Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Visual SLAM with Graph-Cut Optimized Multi-Plane Reconstruction [11.215334675788952]
This paper presents a semantic planar SLAM system that improves pose estimation and mapping using cues from an instance planar segmentation network.
While the mainstream approaches are using RGB-D sensors, employing a monocular camera with such a system still faces challenges such as robust data association and precise geometric model fitting.
arXiv Detail & Related papers (2021-08-09T18:16:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.