M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation
- URL: http://arxiv.org/abs/2509.23728v1
- Date: Sun, 28 Sep 2025 08:16:08 GMT
- Title: M3DLayout: A Multi-Source Dataset of 3D Indoor Layouts and Structured Descriptions for 3D Generation
- Authors: Yiheng Zhang, Zhuojiang Cai, Mingdao Wang, Meitong Guo, Tianxiao Li, Li Lin, Yuwang Wang,
- Abstract summary: In text-driven 3D scene generation, object layout serves as a crucial intermediate representation that bridges high-level language instructions with detailed output.<n>We introduce M3D, a large-scale, multi-source dataset for 3D indoor layout generation.<n>M3D comprises 15,080 layouts and over 258k object instances, integrating real-world scans, professional CAD designs, and procedurally generated scenes.
- Score: 14.956470298543534
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In text-driven 3D scene generation, object layout serves as a crucial intermediate representation that bridges high-level language instructions with detailed geometric output. It not only provides a structural blueprint for ensuring physical plausibility but also supports semantic controllability and interactive editing. However, the learning capabilities of current 3D indoor layout generation models are constrained by the limited scale, diversity, and annotation quality of existing datasets. To address this, we introduce M3DLayout, a large-scale, multi-source dataset for 3D indoor layout generation. M3DLayout comprises 15,080 layouts and over 258k object instances, integrating three distinct sources: real-world scans, professional CAD designs, and procedurally generated scenes. Each layout is paired with detailed structured text describing global scene summaries, relational placements of large furniture, and fine-grained arrangements of smaller items. This diverse and richly annotated resource enables models to learn complex spatial and semantic patterns across a wide variety of indoor environments. To assess the potential of M3DLayout, we establish a benchmark using a text-conditioned diffusion model. Experimental results demonstrate that our dataset provides a solid foundation for training layout generation models. Its multi-source composition enhances diversity, notably through the Inf3DLayout subset which provides rich small-object information, enabling the generation of more complex and detailed scenes. We hope that M3DLayout can serve as a valuable resource for advancing research in text-driven 3D scene synthesis.
Related papers
- HouseLayout3D: A Benchmark and Training-Free Baseline for 3D Layout Estimation in the Wild [17.263404264950257]
Current 3D layout estimation models are primarily trained on synthetic datasets containing simple single room or single floor environments.<n>We introduce House3D, a real world benchmark designed to support progress toward full building scale layout estimation.
arXiv Detail & Related papers (2025-12-02T06:18:08Z) - DisCo-Layout: Disentangling and Coordinating Semantic and Physical Refinement in a Multi-Agent Framework for 3D Indoor Layout Synthesis [76.7196710324494]
3D indoor layout synthesis is crucial for creating virtual environments.<n>DisCo is a novel framework that disentangles and coordinates physical and semantic refinement.
arXiv Detail & Related papers (2025-10-02T16:30:37Z) - SCENEFORGE: Enhancing 3D-text alignment with Structured Scene Compositions [9.41365281895669]
SceneForge is a framework that enhances contrastive alignment between 3D point clouds and text through structured multi-object scene compositions.<n>By augmenting contrastive training with structured, compositional samples, SceneForge effectively addresses the scarcity of large-scale 3D-text datasets.
arXiv Detail & Related papers (2025-09-19T07:13:45Z) - Direct Numerical Layout Generation for 3D Indoor Scene Synthesis via Spatial Reasoning [27.872834485482276]
3D indoor scene synthesis is vital for embodied AI and digital content creation.<n>Existing methods fail to generate scenes that are both open-vocabulary and aligned with fine-grained user instructions.<n>We introduce Direct, a framework that directly generates numerical 3D layouts from text descriptions.
arXiv Detail & Related papers (2025-06-05T17:59:42Z) - Uni3D-MoE: Scalable Multimodal 3D Scene Understanding via Mixture of Experts [49.21162433486564]
We propose Uni3D-MoE, a sparse Mixture-of-Experts (MoE)-based 3D MLLM designed to enable adaptive 3D multimodal fusion.<n>Uni3D-MoE integrates a comprehensive set of 3D modalities, including multi-view RGB and depth images, bird's-eye-view (BEV) maps, point clouds, and voxel representations.<n>Our framework employs a learnable routing mechanism within the sparse MoE-based large language model, dynamically selecting appropriate experts at the token level.
arXiv Detail & Related papers (2025-05-27T12:03:30Z) - ART-DECO: Arbitrary Text Guidance for 3D Detailizer Construction [27.744420022794078]
We introduce a 3D detailizer, a neural model which can instantaneously transform a coarse 3D shape proxy into a high-quality asset.<n>Our model is trained using the text prompt, which defines the shape class and characterizes the appearance and fine-grained style of the generated details.<n>Our detailizer is not optimized for a single shape; it is the result of distilling a generative model, so that it can be reused, without retraining, to generate any number of shapes.
arXiv Detail & Related papers (2025-05-26T18:26:16Z) - LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models [57.92316645992816]
Spatial reasoning is a fundamental aspect of human cognition, enabling intuitive understanding and manipulation of objects in three-dimensional space.<n>We introduce LayoutVLM, a framework and scene layout representation that exploits the semantic knowledge of Vision-Language Models (VLMs)<n>We demonstrate that fine-tuning VLMs with the proposed scene layout representation extracted from existing scene datasets can improve their reasoning performance.
arXiv Detail & Related papers (2024-12-03T06:15:04Z) - LLplace: The 3D Indoor Scene Layout Generation and Editing via Large Language Model [58.24851949945434]
LLplace is a novel 3D indoor scene layout designer based on lightweight fine-tuned open-source LLM Llama3.
LLplace circumvents the need for spatial relationship priors and in-context exemplars, enabling efficient and credible room layout generation.
Our approach demonstrates that LLplace can effectively generate and edit 3D indoor layouts interactively and outperform existing methods in delivering high-quality 3D design solutions.
arXiv Detail & Related papers (2024-06-06T08:53:01Z) - LayoutGPT: Compositional Visual Planning and Generation with Large
Language Models [98.81962282674151]
Large Language Models (LLMs) can serve as visual planners by generating layouts from text conditions.
We propose LayoutGPT, a method to compose in-context visual demonstrations in style sheet language.
arXiv Detail & Related papers (2023-05-24T17:56:16Z) - LayoutTransformer: Layout Generation and Completion with Self-attention [105.21138914859804]
We address the problem of scene layout generation for diverse domains such as images, mobile applications, documents, and 3D objects.
We propose LayoutTransformer, a novel framework that leverages self-attention to learn contextual relationships between layout elements.
Our framework allows us to generate a new layout either from an empty set or from an initial seed set of primitives, and can easily scale to support an arbitrary of primitives per layout.
arXiv Detail & Related papers (2020-06-25T17:56:34Z) - LayoutMP3D: Layout Annotation of Matterport3D [59.11106101006007]
We consider the Matterport3D dataset with their originally provided depth map ground truths and further release our annotations for layout ground truths from a subset of Matterport3D.
Our dataset provides both the layout and depth information, which enables the opportunity to explore the environment by integrating both cues.
arXiv Detail & Related papers (2020-03-30T14:40:56Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.