Related papers: SemLayoutDiff: Semantic Layout Generation with Diffusion Model for Indoor Scene Synthesis

SemLayoutDiff: Semantic Layout Generation with Diffusion Model for Indoor Scene Synthesis

URL: http://arxiv.org/abs/2508.18597v2
Date: Sat, 06 Sep 2025 19:34:22 GMT
Title: SemLayoutDiff: Semantic Layout Generation with Diffusion Model for Indoor Scene Synthesis
Authors: Xiaohao Sun, Divyam Goel, Angel X. Chang,
Abstract summary: SemDiff is a unified model for diverse 3D indoor scenes across multiple room types.<n>It produces spatially coherent, realistic and varied scenes, outperforming previous methods.
Score: 11.874151921903449
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: We present SemLayoutDiff, a unified model for synthesizing diverse 3D indoor scenes across multiple room types. The model introduces a scene layout representation combining a top-down semantic map and attributes for each object. Unlike prior approaches, which cannot condition on architectural constraints, SemLayoutDiff employs a categorical diffusion model capable of conditioning scene synthesis explicitly on room masks. It first generates a coherent semantic map, followed by a cross-attention-based network to predict furniture placements that respect the synthesized layout. Our method also accounts for architectural elements such as doors and windows, ensuring that generated furniture arrangements remain practical and unobstructed. Experiments on the 3D-FRONT dataset show that SemLayoutDiff produces spatially coherent, realistic, and varied scenes, outperforming previous methods.

Related papers

ReSpace: Text-Driven 3D Scene Synthesis and Editing with Preference Alignment [1.0918065824771606]
ReSpace is a generative framework for text-driven 3D indoor scene synthesis and editing.<n>We leverage a dual-stage training approach combining supervised fine-tuning and preference alignment.<n>For scene editing, we employ a zero-shot LLM to handle object removal and prompts for addition.
arXiv Detail & Related papers (2025-06-03T05:22:04Z)
HiScene: Creating Hierarchical 3D Scenes with Isometric View Generation [50.206100327643284]
HiScene is a novel hierarchical framework that bridges the gap between 2D image generation and 3D object generation.<n>We generate 3D content that aligns with 2D representations while maintaining compositional structure.
arXiv Detail & Related papers (2025-04-17T16:33:39Z)
Mixed Diffusion for 3D Indoor Scene Synthesis [55.94569112629208]
We present MiDiffusion, a novel mixed discrete-continuous diffusion model designed to synthesize plausible 3D indoor scenes.<n>We show it outperforms state-of-the-art autoregressive and diffusion models in floor-conditioned 3D scene synthesis.
arXiv Detail & Related papers (2024-05-31T17:54:52Z)
3D scene generation from scene graphs and self-attention [51.49886604454926]
We present a variant of the conditional variational autoencoder (cVAE) model to synthesize 3D scenes from scene graphs and floor plans. We exploit the properties of self-attention layers to capture high-level relationships between objects in a scene.
arXiv Detail & Related papers (2024-04-02T12:26:17Z)
InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with Semantic Graph Prior [27.773451301040424]
InstructScene is a novel generative framework that integrates a semantic graph prior and a layout decoder. We show that the proposed method surpasses existing state-of-the-art approaches by a large margin.
arXiv Detail & Related papers (2024-02-07T10:09:00Z)
CommonScenes: Generating Commonsense 3D Indoor Scenes with Scene Graph Diffusion [83.30168660888913]
We present CommonScenes, a fully generative model that converts scene graphs into corresponding controllable 3D scenes. Our pipeline consists of two branches, one predicting the overall scene layout via a variational auto-encoder and the other generating compatible shapes. The generated scenes can be manipulated by editing the input scene graph and sampling the noise in the diffusion model.
arXiv Detail & Related papers (2023-05-25T17:39:13Z)
DiffuScene: Denoising Diffusion Models for Generative Indoor Scene Synthesis [44.521452102413534]
We present DiffuScene for indoor 3D scene synthesis based on a novel scene configuration denoising diffusion model. It generates 3D instance properties stored in an unordered object set and retrieves the most similar geometry for each object configuration.
arXiv Detail & Related papers (2023-03-24T18:00:15Z)
Towards 3D Scene Understanding by Referring Synthetic Models [65.74211112607315]
Methods typically alleviate on-extensive annotations on real scene scans. We explore how synthetic models rely on real scene categories of synthetic features to a unified feature space. Experiments show that our method achieves the average mAP of 46.08% on the ScanNet S3DIS dataset and 55.49% by learning datasets.
arXiv Detail & Related papers (2022-03-20T13:06:15Z)
ATISS: Autoregressive Transformers for Indoor Scene Synthesis [112.63708524926689]
We present ATISS, a novel autoregressive transformer architecture for creating synthetic indoor environments. We argue that this formulation is more natural, as it makes ATISS generally useful beyond fully automatic room layout synthesis. Our model is trained end-to-end as an autoregressive generative model using only labeled 3D bounding boxes as supervision.
arXiv Detail & Related papers (2021-10-07T17:58:05Z)

This list is automatically generated from the titles and abstracts of the papers in this site.