Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation
- URL: http://arxiv.org/abs/2510.15564v1
- Date: Fri, 17 Oct 2025 11:48:08 GMT
- Title: Imaginarium: Vision-guided High-Quality 3D Scene Layout Generation
- Authors: Xiaoming Zhu, Xu Huang, Qinghongbing Xie, Zhi Deng, Junsheng Yu, Yirui Guan, Zhongyuan Liu, Lin Zhu, Qijun Zhao, Ligang Liu, Long Zeng,
- Abstract summary: This paper presents a novel vision-guided 3D layout generation system.<n>We first construct a high-quality asset library containing 2,037 scene assets and 147 3D scene layouts.<n>We then employ an image generation model to expand prompt representations into images, fine-tuning it to align with our asset library.<n>We optimize the scene layout using scene graphs and overall visual semantics to ensure logical coherence and alignment with the images.
- Score: 27.13700598039439
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Generating artistic and coherent 3D scene layouts is crucial in digital content creation. Traditional optimization-based methods are often constrained by cumbersome manual rules, while deep generative models face challenges in producing content with richness and diversity. Furthermore, approaches that utilize large language models frequently lack robustness and fail to accurately capture complex spatial relationships. To address these challenges, this paper presents a novel vision-guided 3D layout generation system. We first construct a high-quality asset library containing 2,037 scene assets and 147 3D scene layouts. Subsequently, we employ an image generation model to expand prompt representations into images, fine-tuning it to align with our asset library. We then develop a robust image parsing module to recover the 3D layout of scenes based on visual semantics and geometric information. Finally, we optimize the scene layout using scene graphs and overall visual semantics to ensure logical coherence and alignment with the images. Extensive user testing demonstrates that our algorithm significantly outperforms existing methods in terms of layout richness and quality. The code and dataset will be available at https://github.com/HiHiAllen/Imaginarium.
Related papers
- ArtiScene: Language-Driven Artistic 3D Scene Generation Through Image Intermediary [37.41274496314127]
ArtiScene is a training-free automated pipeline for scene design.<n>It generates 2D images from a scene description, then extract the shape and appearance of objects to create 3D models.<n>It outperforms state-of-the-art benchmarks by a large margin in layout and aesthetic quality by quantitative metrics.
arXiv Detail & Related papers (2025-05-31T23:03:54Z) - Constructing a 3D Scene from a Single Image [31.11317559252235]
SceneFuse-3D is a training-free framework designed to synthesize coherent 3D scenes from a single top-down view.<n>We decompose the input image into overlapping regions and generate each using a pretrained 3D object generator.<n>This modular design allows us to overcome resolution bottlenecks and preserve spatial structure without requiring 3D supervision or fine-tuning.
arXiv Detail & Related papers (2025-05-21T17:10:47Z) - InsTex: Indoor Scenes Stylized Texture Synthesis [81.12010726769768]
High-quality textures are crucial for 3D scenes for augmented/virtual reality (ARVR) applications.<n>Current methods suffer from lengthy processing times and visual artifacts.<n>We introduce two-stage architecture designed to generate high-quality textures for 3D scenes.
arXiv Detail & Related papers (2025-01-22T08:37:59Z) - Architect: Generating Vivid and Interactive 3D Scenes with Hierarchical 2D Inpainting [47.014044892025346]
Architect is a generative framework that creates complex and realistic 3D embodied environments leveraging diffusion-based 2D image inpainting.
Our pipeline is further extended to a hierarchical and iterative inpainting process to continuously generate placement of large furniture and small objects to enrich the scene.
arXiv Detail & Related papers (2024-11-14T22:15:48Z) - SceneCraft: Layout-Guided 3D Scene Generation [29.713491313796084]
SceneCraft is a novel method for generating detailed indoor scenes that adhere to textual descriptions and spatial layout preferences.<n>Our method significantly outperforms existing approaches in complex indoor scene generation with diverse textures, consistent geometry, and realistic visual quality.
arXiv Detail & Related papers (2024-10-11T17:59:58Z) - MaGRITTe: Manipulative and Generative 3D Realization from Image, Topview and Text [52.296914125558864]
The generation of 3D scenes from user-specified conditions offers a promising avenue for alleviating the production burden in 3D applications.<n>Previous studies required significant effort to realize the desired scene, owing to limited control conditions.<n>We propose a method for controlling and generating 3D scenes under multimodal conditions using partial images, layout information represented in the top view, and text prompts.
arXiv Detail & Related papers (2024-03-30T12:50:25Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.