Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
- URL: http://arxiv.org/abs/2408.14819v1
- Date: Tue, 27 Aug 2024 07:01:56 GMT
- Title: Build-A-Scene: Interactive 3D Layout Control for Diffusion-Based Image Generation
- Authors: Abdelrahman Eldesokey, Peter Wonka,
- Abstract summary: We propose a diffusion-based approach for Text-to-Image (T2I) generation with interactive 3D layout control.
We replace the traditional 2D boxes used in layout control with 3D boxes.
We revamp the T2I task as a multi-stage generation process, where at each stage, the user can insert, change, and move an object in 3D while preserving objects from earlier stages.
- Score: 44.18315132571804
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a diffusion-based approach for Text-to-Image (T2I) generation with interactive 3D layout control. Layout control has been widely studied to alleviate the shortcomings of T2I diffusion models in understanding objects' placement and relationships from text descriptions. Nevertheless, existing approaches for layout control are limited to 2D layouts, require the user to provide a static layout beforehand, and fail to preserve generated images under layout changes. This makes these approaches unsuitable for applications that require 3D object-wise control and iterative refinements, e.g., interior design and complex scene generation. To this end, we leverage the recent advancements in depth-conditioned T2I models and propose a novel approach for interactive 3D layout control. We replace the traditional 2D boxes used in layout control with 3D boxes. Furthermore, we revamp the T2I task as a multi-stage generation process, where at each stage, the user can insert, change, and move an object in 3D while preserving objects from earlier stages. We achieve this through our proposed Dynamic Self-Attention (DSA) module and the consistent 3D object translation strategy. Experiments show that our approach can generate complicated scenes based on 3D layouts, boosting the object generation success rate over the standard depth-conditioned T2I methods by 2x. Moreover, it outperforms other methods in comparison in preserving objects under layout changes. Project Page: \url{https://abdo-eldesokey.github.io/build-a-scene/}
Related papers
- Layout-your-3D: Controllable and Precise 3D Generation with 2D Blueprint [61.25279122171029]
We present a framework that allows controllable and compositional 3D generation from text prompts.
Our approach leverages 2D layouts as a blueprint to facilitate precise and plausible control over 3D generation.
arXiv Detail & Related papers (2024-10-20T13:41:50Z) - iControl3D: An Interactive System for Controllable 3D Scene Generation [57.048647153684485]
iControl3D is a novel interactive system that empowers users to generate and render customizable 3D scenes with precise control.
We leverage 3D meshes as an intermediary proxy to iteratively merge individual 2D diffusion-generated images into a cohesive and unified 3D scene representation.
Our neural rendering interface enables users to build a radiance field of their scene online and navigate the entire scene.
arXiv Detail & Related papers (2024-08-03T06:35:09Z) - Interactive3D: Create What You Want by Interactive 3D Generation [13.003964182554572]
We introduce Interactive3D, an innovative framework for interactive 3D generation that grants users precise control over the generative process.
Our experiments demonstrate that Interactive3D markedly improves the controllability and quality of 3D generation.
arXiv Detail & Related papers (2024-04-25T11:06:57Z) - Customizing Text-to-Image Diffusion with Camera Viewpoint Control [53.621518249820745]
We introduce a new task -- enabling explicit control of camera viewpoint for model customization.
This allows us to modify object properties amongst various background scenes via text prompts.
We propose to condition the 2D diffusion process on rendered, view-dependent features of the new object.
arXiv Detail & Related papers (2024-04-18T16:59:51Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - DreamControl: Control-Based Text-to-3D Generation with 3D Self-Prior [97.694840981611]
We propose a two-stage 2D-lifting framework, namely DreamControl.
It generates fine-grained objects with control-based score distillation.
DreamControl can generate high-quality 3D content in terms of both geometry consistency and texture fidelity.
arXiv Detail & Related papers (2023-12-11T15:12:50Z) - Ctrl-Room: Controllable Text-to-3D Room Meshes Generation with Layout Constraints [35.073500525250346]
We present Ctrl-Room, which can generate convincing 3D rooms with designer-style layouts and high-fidelity textures from just a text prompt.
Ctrl-Room enables versatile interactive editing operations such as resizing or moving individual furniture items.
arXiv Detail & Related papers (2023-10-05T15:29:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.