BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis
- URL: http://arxiv.org/abs/2403.11273v2
- Date: Mon, 18 Nov 2024 14:19:52 GMT
- Title: BrightDreamer: Generic 3D Gaussian Generative Framework for Fast Text-to-3D Synthesis
- Authors: Lutao Jiang, Xu Zheng, Yuanhuiyi Lyu, Jiazhou Zhou, Lin Wang,
- Abstract summary: This paper presents BrightDreamer, an end-to-end feed-forward approach that can achieve generalizable and fast (77 ms) text-to-3D generation.
We first propose a Text-guided Shape Deformation (TSD) network to predict the deformed shape and its new positions.
We then design a novel Text-guided Triplane Generator (TTG) to generate a triplane representation for a 3D object.
- Score: 10.151307760539071
- License:
- Abstract: Text-to-3D synthesis has recently seen intriguing advances by combining the text-to-image priors with 3D representation methods, e.g., 3D Gaussian Splatting (3D GS), via Score Distillation Sampling (SDS). However, a hurdle of existing methods is the low efficiency, per-prompt optimization for a single 3D object. Therefore, it is imperative for a paradigm shift from per-prompt optimization to feed-forward generation for any unseen text prompts, which yet remains challenging. An obstacle is how to directly generate a set of millions of 3D Gaussians to represent a 3D object. This paper presents BrightDreamer, an end-to-end feed-forward approach that can achieve generalizable and fast (77 ms) text-to-3D generation. Our key idea is to formulate the generation process as estimating the 3D deformation from an anchor shape with predefined positions. For this, we first propose a Text-guided Shape Deformation (TSD) network to predict the deformed shape and its new positions, used as the centers (one attribute) of 3D Gaussians. To estimate the other four attributes (i.e., scaling, rotation, opacity, and SH), we then design a novel Text-guided Triplane Generator (TTG) to generate a triplane representation for a 3D object. The center of each Gaussian enables us to transform the spatial feature into the four attributes. The generated 3D Gaussians can be finally rendered at 705 frames per second. Extensive experiments demonstrate the superiority of our method over existing methods. Also, BrightDreamer possesses a strong semantic understanding capability even for complex text prompts. The code is available in the project page.
Related papers
- A General Framework to Boost 3D GS Initialization for Text-to-3D Generation by Lexical Richness [10.09002362480534]
This paper proposes a novel framework to boost the 3D GS Initialization for text-to-3D generation upon the lexical richness.
Our key idea is to aggregate 3D Gaussians into spatially uniform voxels to represent complex shapes.
Our framework can be seamlessly plugged into SoTA training frameworks, e.g., LucidDreamer, for semantically consistent text-to-3D generation.
arXiv Detail & Related papers (2024-08-02T13:46:15Z) - DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts.
Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets.
We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - GVGEN: Text-to-3D Generation with Volumetric Representation [89.55687129165256]
3D Gaussian splatting has emerged as a powerful technique for 3D reconstruction and generation, known for its fast and high-quality rendering capabilities.
This paper introduces a novel diffusion-based framework, GVGEN, designed to efficiently generate 3D Gaussian representations from text input.
arXiv Detail & Related papers (2024-03-19T17:57:52Z) - Hyper-3DG: Text-to-3D Gaussian Generation via Hypergraph [20.488040789522604]
We propose a method named 3D Gaussian Generation via Hypergraph (Hyper-3DG)'', designed to capture the sophisticated high-order correlations present within 3D objects.
Our framework allows for the production of finely generated 3D objects within a cohesive optimization, effectively circumventing degradation.
arXiv Detail & Related papers (2024-03-14T09:59:55Z) - AGG: Amortized Generative 3D Gaussians for Single Image to 3D [108.38567665695027]
We introduce an Amortized Generative 3D Gaussian framework (AGG) that instantly produces 3D Gaussians from a single image.
AGG decomposes the generation of 3D Gaussian locations and other appearance attributes for joint optimization.
We propose a cascaded pipeline that first generates a coarse representation of the 3D data and later upsamples it with a 3D Gaussian super-resolution module.
arXiv Detail & Related papers (2024-01-08T18:56:33Z) - Align Your Gaussians: Text-to-4D with Dynamic 3D Gaussians and Composed
Diffusion Models [94.07744207257653]
We focus on the underexplored text-to-4D setting and synthesize dynamic, animated 3D objects.
We combine text-to-image, text-to-video, and 3D-aware multiview diffusion models to provide feedback during 4D object optimization.
arXiv Detail & Related papers (2023-12-21T11:41:02Z) - TPA3D: Triplane Attention for Fast Text-to-3D Generation [28.33270078863519]
We propose Triplane Attention for text-guided 3D generation (TPA3D)
TPA3D is an end-to-end trainable GAN-based deep learning model for fast text-to-3D generation.
We show that TPA3D generates high-quality 3D textured shapes aligned with fine-grained descriptions.
arXiv Detail & Related papers (2023-12-05T10:39:37Z) - Instant3D: Instant Text-to-3D Generation [101.25562463919795]
We propose a novel framework for fast text-to-3D generation, dubbed Instant3D.
Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network.
arXiv Detail & Related papers (2023-11-14T18:59:59Z) - GaussianDreamer: Fast Generation from Text to 3D Gaussians by Bridging 2D and 3D Diffusion Models [102.22388340738536]
2D and 3D diffusion models can generate decent 3D objects based on prompts.
3D diffusion models have good 3D consistency, but their quality and generalization are limited as trainable 3D data is expensive and hard to obtain.
This paper attempts to bridge the power from the two types of diffusion models via the recent explicit and efficient 3D Gaussian splatting representation.
arXiv Detail & Related papers (2023-10-12T17:22:24Z) - Text-to-3D using Gaussian Splatting [18.163413810199234]
This paper proposes GSGEN, a novel method that adopts Gaussian Splatting, a recent state-of-the-art representation, to text-to-3D generation.
GSGEN aims at generating high-quality 3D objects and addressing existing shortcomings by exploiting the explicit nature of Gaussian Splatting.
Our approach can generate 3D assets with delicate details and accurate geometry.
arXiv Detail & Related papers (2023-09-28T16:44:31Z) - Dream3D: Zero-Shot Text-to-3D Synthesis Using 3D Shape Prior and
Text-to-Image Diffusion Models [44.34479731617561]
We introduce explicit 3D shape priors into the CLIP-guided 3D optimization process.
We present a simple yet effective approach that directly bridges the text and image modalities with a powerful text-to-image diffusion model.
Our method, Dream3D, is capable of generating imaginative 3D content with superior visual quality and shape accuracy.
arXiv Detail & Related papers (2022-12-28T18:23:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.