Related papers: A Modular Framework for Single-View 3D Reconstruction of Indoor Environments

A Modular Framework for Single-View 3D Reconstruction of Indoor Environments

URL: http://arxiv.org/abs/2512.17955v1
Date: Wed, 17 Dec 2025 22:49:43 GMT
Title: A Modular Framework for Single-View 3D Reconstruction of Indoor Environments
Authors: Yuxiao Li,
Abstract summary: We propose a modular framework for single-view indoor scene 3D reconstruction.<n>Several core modules are powered by diffusion techniques.<n>The framework holds promising potential for applications in interior design, real estate, and augmented reality.
Score: 1.979245586749314
License: http://creativecommons.org/licenses/by/4.0/
Abstract: We propose a modular framework for single-view indoor scene 3D reconstruction, where several core modules are powered by diffusion techniques. Traditional approaches for this task often struggle with the complex instance shapes and occlusions inherent in indoor environments. They frequently overshoot by attempting to predict 3D shapes directly from incomplete 2D images, which results in limited reconstruction quality. We aim to overcome this limitation by splitting the process into two steps: first, we employ diffusion-based techniques to predict the complete views of the room background and occluded indoor instances, then transform them into 3D. Our modular framework makes contributions to this field through the following components: an amodal completion module for restoring the full view of occluded instances, an inpainting model specifically trained to predict room layouts, a hybrid depth estimation technique that balances overall geometric accuracy with fine detail expressiveness, and a view-space alignment method that exploits both 2D and 3D cues to ensure precise placement of instances within the scene. This approach effectively reconstructs both foreground instances and the room background from a single image. Extensive experiments on the 3D-Front dataset demonstrate that our method outperforms current state-of-the-art (SOTA) approaches in terms of both visual quality and reconstruction accuracy. The framework holds promising potential for applications in interior design, real estate, and augmented reality.

Related papers

Unposed Sparse Views Room Layout Reconstruction in the Age of Pretrain Model [15.892685514932323]
We introduce Plane-DUSt3R, a novel method for multi-view room layout estimation.<n>Plane-DUSt3R incorporates the DUSt3R framework and fine-tunes on a room layout dataset (Structure3D) with a modified objective to estimate structural planes.<n>By generating uniform and parsimonious results, Plane-DUSt3R enables room layout estimation with only a single post-processing step and 2D detection results.
arXiv Detail & Related papers (2025-02-24T02:14:19Z)
Unified Few-shot Crack Segmentation and its Precise 3D Automatic Measurement in Concrete Structures [2.178830801484721]
This study introduces a framework for two-dimensional (2D) crack detection, three-dimensional (3D) reconstruction, and 3D automatic crack measurement.<n>We developed a crack segmentation method with strong generalization across unfamiliar scenarios, enabling the generation of precise 2D crack masks.<n>By leveraging both image- and LiDAR-SLAM, we developed a multi-frame and multi-modal fusion framework that produces dense, colorized point clouds.
arXiv Detail & Related papers (2025-01-15T23:36:05Z)
BIFRÖST: 3D-Aware Image compositing with Language Instructions [27.484947109237964]
Bifr"ost is a novel 3D-aware framework that is built upon diffusion models to perform instruction-based image composition. Bifr"ost addresses issues by training MLLM as a 2.5D location predictor and integrating depth maps as an extra condition during the generation process.
arXiv Detail & Related papers (2024-10-24T18:35:12Z)
Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model designed to synthesize plausible 3D indoor scenes.<n>We show it outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z)
Part123: Part-aware 3D Reconstruction from a Single-view Image [54.589723979757515]
Part123 is a novel framework for part-aware 3D reconstruction from a single-view image. We introduce contrastive learning into a neural rendering framework to learn a part-aware feature space. A clustering-based algorithm is also developed to automatically derive 3D part segmentation results from the reconstructed models.
arXiv Detail & Related papers (2024-05-27T07:10:21Z)
GEOcc: Geometrically Enhanced 3D Occupancy Network with Implicit-Explicit Depth Fusion and Contextual Self-Supervision [49.839374549646884]
This paper presents GEOcc, a Geometric-Enhanced Occupancy network tailored for vision-only surround-view perception.<n>Our approach achieves State-Of-The-Art performance on the Occ3D-nuScenes dataset with the least image resolution needed and the most weightless image backbone.
arXiv Detail & Related papers (2024-05-17T07:31:20Z)
Single-view 3D Scene Reconstruction with High-fidelity Shape and Texture [47.44029968307207]
We propose a novel framework for simultaneous high-fidelity recovery of object shapes and textures from single-view images. Our approach utilizes the proposed Single-view neural implicit Shape and Radiance field (SSR) representations to leverage both explicit 3D shape supervision and volume rendering. A distinctive feature of our framework is its ability to generate fine-grained textured meshes while seamlessly integrating rendering capabilities into the single-view 3D reconstruction model.
arXiv Detail & Related papers (2023-11-01T11:46:15Z)
StructuredMesh: 3D Structured Optimization of Fa\c{c}ade Components on Photogrammetric Mesh Models using Binary Integer Programming [17.985961236568663]
We present StructuredMesh, a novel approach for reconstructing faccade structures conforming to the regularity of buildings within photogrammetric mesh models. Our method involves capturing multi-view color and depth images of the building model using a virtual camera. We then utilize the depth image to remap these boxes into 3D space, generating an initial faccade layout.
arXiv Detail & Related papers (2023-06-07T06:40:54Z)
Towards High-Fidelity Single-view Holistic Reconstruction of Indoor Scenes [50.317223783035075]
We present a new framework to reconstruct holistic 3D indoor scenes from single-view images. We propose an instance-aligned implicit function (InstPIFu) for detailed object reconstruction. Our code and model will be made publicly available.
arXiv Detail & Related papers (2022-07-18T14:54:57Z)
LiP-Flow: Learning Inference-time Priors for Codec Avatars via Normalizing Flows in Latent Space [90.74976459491303]
We introduce a prior model that is conditioned on the runtime inputs and tie this prior space to the 3D face model via a normalizing flow in the latent space. A normalizing flow bridges the two representation spaces and transforms latent samples from one domain to another, allowing us to define a latent likelihood objective. We show that our approach leads to an expressive and effective prior, capturing facial dynamics and subtle expressions better.
arXiv Detail & Related papers (2022-03-15T13:22:57Z)
SparseFusion: Dynamic Human Avatar Modeling from Sparse RGBD Images [49.52782544649703]
We propose a novel approach to reconstruct 3D human body shapes based on a sparse set of RGBD frames. The main challenge is how to robustly fuse these sparse frames into a canonical 3D model. Our framework is flexible, with potential applications going beyond shape reconstruction.
arXiv Detail & Related papers (2020-06-05T18:53:36Z)

This list is automatically generated from the titles and abstracts of the papers in this site.