Compositional 3D Scene Generation using Locally Conditioned Diffusion
- URL: http://arxiv.org/abs/2303.12218v2
- Date: Thu, 23 Mar 2023 00:29:24 GMT
- Title: Compositional 3D Scene Generation using Locally Conditioned Diffusion
- Authors: Ryan Po, Gordon Wetzstein
- Abstract summary: We introduce textbflocally conditioned diffusion as an approach to compositional scene diffusion.
We demonstrate a score distillation sampling--based text-to-3D synthesis pipeline that enables compositional 3D scene generation at a higher fidelity than relevant baselines.
- Score: 49.5784841881488
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Designing complex 3D scenes has been a tedious, manual process requiring
domain expertise. Emerging text-to-3D generative models show great promise for
making this task more intuitive, but existing approaches are limited to
object-level generation. We introduce \textbf{locally conditioned diffusion} as
an approach to compositional scene diffusion, providing control over semantic
parts using text prompts and bounding boxes while ensuring seamless transitions
between these parts. We demonstrate a score distillation sampling--based
text-to-3D synthesis pipeline that enables compositional 3D scene generation at
a higher fidelity than relevant baselines.
Related papers
- Semantic Score Distillation Sampling for Compositional Text-to-3D Generation [28.88237230872795]
Generating high-quality 3D assets from textual descriptions remains a pivotal challenge in computer graphics and vision research.
We introduce a novel SDS approach, designed to improve the expressiveness and accuracy of compositional text-to-3D generation.
Our approach integrates new semantic embeddings that maintain consistency across different rendering views.
By leveraging explicit semantic guidance, our method unlocks the compositional capabilities of existing pre-trained diffusion models.
arXiv Detail & Related papers (2024-10-11T17:26:00Z) - Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting [52.150502668874495]
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation.
GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing.
arXiv Detail & Related papers (2024-02-11T13:40:08Z) - InstructScene: Instruction-Driven 3D Indoor Scene Synthesis with
Semantic Graph Prior [27.773451301040424]
InstructScene is a novel generative framework that integrates a semantic graph prior and a layout decoder.
We show that the proposed method surpasses existing state-of-the-art approaches by a large margin.
arXiv Detail & Related papers (2024-02-07T10:09:00Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting [57.14748263512924]
CG3D is a method for compositionally generating scalable 3D assets.
Gamma radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes.
arXiv Detail & Related papers (2023-11-29T18:55:38Z) - 3D Scene Diffusion Guidance using Scene Graphs [3.207455883863626]
We propose a novel approach for 3D scene diffusion guidance using scene graphs.
To leverage the relative spatial information the scene graphs provide, we make use of relational graph convolutional blocks within our denoising network.
arXiv Detail & Related papers (2023-08-08T06:16:37Z) - ATT3D: Amortized Text-to-3D Object Synthesis [78.96673650638365]
We amortize optimization over text prompts by training on many prompts simultaneously with a unified model, instead of separately.
Our framework - Amortized text-to-3D (ATT3D) - enables knowledge-sharing between prompts to generalize to unseen setups and smooths between text for novel assets and simple animations.
arXiv Detail & Related papers (2023-06-06T17:59:10Z) - CompoNeRF: Text-guided Multi-object Compositional NeRF with Editable 3D Scene Layout [13.364394556439992]
Text-to-3D form plays a crucial role in creating editable 3D scenes for AR/VR.
Recent advances have shown promise in merging neural radiance fields (NeRFs) with pre-trained diffusion models for text-to-3D object generation.
We propose a novel framework, dubbed CompoNeRF, by integrating an editable 3D scene layout with object-specific and scene-wide guidance mechanisms.
Our framework achieves up to a textbf54% improvement by the multi-view CLIP score metric.
arXiv Detail & Related papers (2023-03-24T07:37:09Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.