SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation
- URL: http://arxiv.org/abs/2412.01801v2
- Date: Tue, 03 Dec 2024 10:32:05 GMT
- Title: SceneFactor: Factored Latent 3D Diffusion for Controllable 3D Scene Generation
- Authors: Alexey Bokhovkin, Quan Meng, Shubham Tulsiani, Angela Dai,
- Abstract summary: SceneFactor is a diffusion-based approach for large-scale 3D scene generation.
It enables controllable generation and effortless editing.
Our approach enables high-fidelity 3D scene synthesis with effective controllable editing.
- Score: 44.354071773885735
- License:
- Abstract: We present SceneFactor, a diffusion-based approach for large-scale 3D scene generation that enables controllable generation and effortless editing. SceneFactor enables text-guided 3D scene synthesis through our factored diffusion formulation, leveraging latent semantic and geometric manifolds for generation of arbitrary-sized 3D scenes. While text input enables easy, controllable generation, text guidance remains imprecise for intuitive, localized editing and manipulation of the generated 3D scenes. Our factored semantic diffusion generates a proxy semantic space composed of semantic 3D boxes that enables controllable editing of generated scenes by adding, removing, changing the size of the semantic 3D proxy boxes that guides high-fidelity, consistent 3D geometric editing. Extensive experiments demonstrate that our approach enables high-fidelity 3D scene synthesis with effective controllable editing through our factored diffusion approach.
Related papers
- Layout2Scene: 3D Semantic Layout Guided Scene Generation via Geometry and Appearance Diffusion Priors [52.63385546943866]
We present a text-to-scene generation method (namely, Layout2Scene) using additional semantic layout as the prompt to inject precise control of 3D object positions.
To fully leverage 2D diffusion priors in geometry and appearance generation, we introduce a semantic-guided geometry diffusion model and a semantic-geometry guided diffusion model.
Our method can generate more plausible and realistic scenes as compared to state-of-the-art approaches.
arXiv Detail & Related papers (2025-01-05T12:20:13Z) - LT3SD: Latent Trees for 3D Scene Diffusion [71.91446143124648]
We present LT3SD, a novel latent diffusion model for large-scale 3D scene generation.
We demonstrate the efficacy and benefits of LT3SD for large-scale, high-quality unconditional 3D scene generation.
arXiv Detail & Related papers (2024-09-12T16:55:51Z) - Coin3D: Controllable and Interactive 3D Assets Generation with Proxy-Guided Conditioning [52.81032340916171]
Coin3D allows users to control the 3D generation using a coarse geometry proxy assembled from basic shapes.
Our method achieves superior controllability and flexibility in the 3D assets generation task.
arXiv Detail & Related papers (2024-05-13T17:56:13Z) - GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting [52.150502668874495]
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation.
GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing.
arXiv Detail & Related papers (2024-02-11T13:40:08Z) - Progressive3D: Progressively Local Editing for Text-to-3D Content Creation with Complex Semantic Prompts [38.94299662658179]
We propose a general framework named Progressive3D, which decomposes the entire generation into a series of locally progressive editing steps.
We constrain the content change to only occur in regions determined by user-defined region prompts in each editing step.
Extensive experiments demonstrate that the proposed Progressive3D framework generates precise 3D content for prompts with complex semantics.
arXiv Detail & Related papers (2023-10-18T08:23:14Z) - Compositional 3D Scene Generation using Locally Conditioned Diffusion [49.5784841881488]
We introduce textbflocally conditioned diffusion as an approach to compositional scene diffusion.
We demonstrate a score distillation sampling--based text-to-3D synthesis pipeline that enables compositional 3D scene generation at a higher fidelity than relevant baselines.
arXiv Detail & Related papers (2023-03-21T22:37:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.