CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting
- URL: http://arxiv.org/abs/2311.17907v1
- Date: Wed, 29 Nov 2023 18:55:38 GMT
- Title: CG3D: Compositional Generation for Text-to-3D via Gaussian Splatting
- Authors: Alexander Vilesov, Pradyumna Chari, Achuta Kadambi
- Abstract summary: CG3D is a method for compositionally generating scalable 3D assets.
Gamma radiance fields, parameterized to allow for compositions of objects, possess the capability to enable semantically and physically consistent scenes.
- Score: 57.14748263512924
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: With the onset of diffusion-based generative models and their ability to
generate text-conditioned images, content generation has received a massive
invigoration. Recently, these models have been shown to provide useful guidance
for the generation of 3D graphics assets. However, existing work in
text-conditioned 3D generation faces fundamental constraints: (i) inability to
generate detailed, multi-object scenes, (ii) inability to textually control
multi-object configurations, and (iii) physically realistic scene composition.
In this work, we propose CG3D, a method for compositionally generating scalable
3D assets that resolves these constraints. We find that explicit Gaussian
radiance fields, parameterized to allow for compositions of objects, possess
the capability to enable semantically and physically consistent scenes. By
utilizing a guidance framework built around this explicit representation, we
show state of the art results, capable of even exceeding the guiding diffusion
model in terms of object combinations and physics accuracy.
Related papers
- Enhancing Single Image to 3D Generation using Gaussian Splatting and Hybrid Diffusion Priors [17.544733016978928]
3D object generation from a single image involves estimating the full 3D geometry and texture of unseen views from an unposed RGB image captured in the wild.
Recent advancements in 3D object generation have introduced techniques that reconstruct an object's 3D shape and texture.
We propose bridging the gap between 2D and 3D diffusion models to address this limitation.
arXiv Detail & Related papers (2024-10-12T10:14:11Z) - Geometry Image Diffusion: Fast and Data-Efficient Text-to-3D with Image-Based Surface Representation [2.3213238782019316]
GIMDiffusion is a novel Text-to-3D model that utilizes geometry images to efficiently represent 3D shapes using 2D images.
We exploit the rich 2D priors of existing Text-to-Image models such as Stable Diffusion.
In short, GIMDiffusion enables the generation of 3D assets at speeds comparable to current Text-to-Image models.
arXiv Detail & Related papers (2024-09-05T17:21:54Z) - L3GO: Language Agents with Chain-of-3D-Thoughts for Generating
Unconventional Objects [53.4874127399702]
We propose a language agent with chain-of-3D-thoughts (L3GO), an inference-time approach that can reason about part-based 3D mesh generation.
We develop a new benchmark, Unconventionally Feasible Objects (UFO), as well as SimpleBlenv, a wrapper environment built on top of Blender.
Our approach surpasses the standard GPT-4 and other language agents for 3D mesh generation on ShapeNet.
arXiv Detail & Related papers (2024-02-14T09:51:05Z) - GALA3D: Towards Text-to-3D Complex Scene Generation via Layout-guided Generative Gaussian Splatting [52.150502668874495]
We present GALA3D, generative 3D GAussians with LAyout-guided control, for effective compositional text-to-3D generation.
GALA3D is a user-friendly, end-to-end framework for state-of-the-art scene-level 3D content generation and controllable editing.
arXiv Detail & Related papers (2024-02-11T13:40:08Z) - FMGS: Foundation Model Embedded 3D Gaussian Splatting for Holistic 3D Scene Understanding [11.118857208538039]
We present Foundation Model Embedded Gaussian Splatting (S), which incorporates vision-language embeddings of foundation models into 3D Gaussian Splatting (GS)
Results demonstrate remarkable multi-view semantic consistency, facilitating diverse downstream tasks, beating state-of-the-art methods by 10.2 percent on open-vocabulary language-based object detection.
This research explores the intersection of vision, language, and 3D scene representation, paving the way for enhanced scene understanding in uncontrolled real-world environments.
arXiv Detail & Related papers (2024-01-03T20:39:02Z) - SceneWiz3D: Towards Text-guided 3D Scene Composition [134.71933134180782]
Existing approaches either leverage large text-to-image models to optimize a 3D representation or train 3D generators on object-centric datasets.
We introduce SceneWiz3D, a novel approach to synthesize high-fidelity 3D scenes from text.
arXiv Detail & Related papers (2023-12-13T18:59:30Z) - CC3D: Layout-Conditioned Generation of Compositional 3D Scenes [49.281006972028194]
We introduce CC3D, a conditional generative model that synthesizes complex 3D scenes conditioned on 2D semantic scene layouts.
Our evaluations on synthetic 3D-FRONT and real-world KITTI-360 datasets demonstrate that our model generates scenes of improved visual and geometric quality.
arXiv Detail & Related papers (2023-03-21T17:59:02Z) - Towards Realistic 3D Embedding via View Alignment [53.89445873577063]
This paper presents an innovative View Alignment GAN (VA-GAN) that composes new images by embedding 3D models into 2D background images realistically and automatically.
VA-GAN consists of a texture generator and a differential discriminator that are inter-connected and end-to-end trainable.
arXiv Detail & Related papers (2020-07-14T14:45:00Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.