Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction
- URL: http://arxiv.org/abs/2602.09016v1
- Date: Mon, 09 Feb 2026 18:58:46 GMT
- Title: Raster2Seq: Polygon Sequence Generation for Floorplan Reconstruction
- Authors: Hao Phung, Hadar Averbuch-Elor,
- Abstract summary: We propose Raster2Seq, framing floorplan reconstruction as a sequence-to-sequence task.<n>An autoregressive decoder learns to predict the next corner conditioned on image features and previously generated corners.<n>Our method achieves state-of-the-art performance on standard benchmarks.
- Score: 21.366278792227785
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Reconstructing a structured vector-graphics representation from a rasterized floorplan image is typically an important prerequisite for computational tasks involving floorplans such as automated understanding or CAD workflows. However, existing techniques struggle in faithfully generating the structure and semantics conveyed by complex floorplans that depict large indoor spaces with many rooms and a varying numbers of polygon corners. To this end, we propose Raster2Seq, framing floorplan reconstruction as a sequence-to-sequence task in which floorplan elements--such as rooms, windows, and doors--are represented as labeled polygon sequences that jointly encode geometry and semantics. Our approach introduces an autoregressive decoder that learns to predict the next corner conditioned on image features and previously generated corners using guidance from learnable anchors. These anchors represent spatial coordinates in image space, hence allowing for effectively directing the attention mechanism to focus on informative image regions. By embracing the autoregressive mechanism, our method offers flexibility in the output format, enabling for efficiently handling complex floorplans with numerous rooms and diverse polygon structures. Our method achieves state-of-the-art performance on standard benchmarks such as Structure3D, CubiCasa5K, and Raster2Graph, while also demonstrating strong generalization to more challenging datasets like WAFFLE, which contain diverse room structures and complex geometric variations.
Related papers
- FloorplanVLM: A Vision-Language Model for Floorplan Vectorization [15.691267151619442]
We present FloorplanVLM, a unified framework that reformulates vectorization as an image-conditioned sequence modeling task.<n>This 'pixels-to-sequence' paradigm enables the precise and holistic constraint satisfaction of complex grounding, such as walls and curved arcs.
arXiv Detail & Related papers (2026-02-06T08:57:52Z) - ControlVP: Interactive Geometric Refinement of AI-Generated Images with Consistent Vanishing Points [32.23473666846317]
We propose ControlVP, a user-guided framework for correcting vanishing point inconsistencies in generated images.<n>Our approach extends a pre-trained diffusion model by incorporating structural guidance derived from building contours.<n>Our method enhances global geometric consistency while maintaining visual fidelity comparable to the baselines.
arXiv Detail & Related papers (2025-12-08T12:38:11Z) - IGGT: Instance-Grounded Geometry Transformer for Semantic 3D Reconstruction [82.53307702809606]
Humans naturally perceive the geometric structure and semantic content of a 3D world as intertwined dimensions.<n>We propose InstanceGrounded Geometry Transformer (IGGT) to unify the knowledge for both spatial reconstruction and instance-level contextual understanding.
arXiv Detail & Related papers (2025-10-26T14:57:44Z) - SeqAffordSplat: Scene-level Sequential Affordance Reasoning on 3D Gaussian Splatting [85.87902260102652]
We introduce the novel task of Sequential 3D Gaussian Affordance Reasoning.<n>We then propose SeqSplatNet, an end-to-end framework that directly maps an instruction to a sequence of 3D affordance masks.<n>Our method sets a new state-of-the-art on our challenging benchmark, effectively advancing affordance reasoning from single-step interactions to complex, sequential tasks at the scene level.
arXiv Detail & Related papers (2025-07-31T17:56:55Z) - Boosting Multi-View Indoor 3D Object Detection via Adaptive 3D Volume Construction [10.569056109735735]
This work presents SGCDet, a novel multi-view indoor 3D object detection framework based on adaptive 3D volume construction.<n>We introduce a geometry and context aware aggregation module to integrate geometric and contextual information within adaptive regions in each image.<n>We show that SGCDet achieves state-of-the-art performance on the ScanNet, ScanNet200 and ARKitScenes datasets.
arXiv Detail & Related papers (2025-07-24T11:58:01Z) - Open-Vocabulary Octree-Graph for 3D Scene Understanding [54.11828083068082]
Octree-Graph is a novel scene representation for open-vocabulary 3D scene understanding.
An adaptive-octree structure is developed that stores semantics and depicts the occupancy of an object adjustably according to its shape.
arXiv Detail & Related papers (2024-11-25T10:14:10Z) - Large Spatial Model: End-to-end Unposed Images to Semantic 3D [79.94479633598102]
Large Spatial Model (LSM) processes unposed RGB images directly into semantic radiance fields.
LSM simultaneously estimates geometry, appearance, and semantics in a single feed-forward operation.
It can generate versatile label maps by interacting with language at novel viewpoints.
arXiv Detail & Related papers (2024-10-24T17:54:42Z) - View-Consistent Hierarchical 3D Segmentation Using Ultrametric Feature Fields [52.08335264414515]
We learn a novel feature field within a Neural Radiance Field (NeRF) representing a 3D scene.
Our method takes view-inconsistent multi-granularity 2D segmentations as input and produces a hierarchy of 3D-consistent segmentations as output.
We evaluate our method and several baselines on synthetic datasets with multi-view images and multi-granular segmentation, showcasing improved accuracy and viewpoint-consistency.
arXiv Detail & Related papers (2024-05-30T04:14:58Z) - N2F2: Hierarchical Scene Understanding with Nested Neural Feature Fields [112.02885337510716]
Nested Neural Feature Fields (N2F2) is a novel approach that employs hierarchical supervision to learn a single feature field.
We leverage a 2D class-agnostic segmentation model to provide semantically meaningful pixel groupings at arbitrary scales in the image space.
Our approach outperforms the state-of-the-art feature field distillation methods on tasks such as open-vocabulary 3D segmentation and localization.
arXiv Detail & Related papers (2024-03-16T18:50:44Z) - Flattening-Net: Deep Regular 2D Representation for 3D Point Cloud
Analysis [66.49788145564004]
We present an unsupervised deep neural architecture called Flattening-Net to represent irregular 3D point clouds of arbitrary geometry and topology.
Our methods perform favorably against the current state-of-the-art competitors.
arXiv Detail & Related papers (2022-12-17T15:05:25Z) - Connecting the Dots: Floorplan Reconstruction Using Two-Level Queries [27.564355569013706]
We develop a novel Transformer architecture that generates polygons of multiple rooms in parallel.
Our method achieves a new state-of-the-art for two challenging datasets, Structured3D and SceneCAD.
It can readily be extended to predict additional information, i.e., semantic room types and architectural elements like doors and windows.
arXiv Detail & Related papers (2022-11-28T18:59:09Z) - MCTS with Refinement for Proposals Selection Games in Scene
Understanding [32.92475660892122]
We propose a novel method applicable in many scene understanding problems that adapts the Monte Carlo Tree Search (MCTS) algorithm.
From a generated pool of proposals, our method jointly selects and optimize proposals that maximize the objective term.
Our method shows high performance on the Matterport3D dataset without introducing hard constraints on room layout configurations.
arXiv Detail & Related papers (2022-07-07T10:15:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.