SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent
Text-to-3D
- URL: http://arxiv.org/abs/2310.02596v2
- Date: Fri, 20 Oct 2023 04:02:22 GMT
- Title: SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent
Text-to-3D
- Authors: Weiyu Li, Rui Chen, Xuelin Chen, Ping Tan
- Abstract summary: It is inherently ambiguous to lift 2D results from pre-trained diffusion models to a 3D world for text-to-3D generation.
We improve consistency by aligning the 2D geometric priors in diffusion models with well-defined 3D shapes during the lifting.
Our method represents a new state-of-the-art performance with an 85+% consistency rate by human evaluation.
- Score: 40.088688751115214
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: It is inherently ambiguous to lift 2D results from pre-trained diffusion
models to a 3D world for text-to-3D generation. 2D diffusion models solely
learn view-agnostic priors and thus lack 3D knowledge during the lifting,
leading to the multi-view inconsistency problem. We find that this problem
primarily stems from geometric inconsistency, and avoiding misplaced geometric
structures substantially mitigates the problem in the final outputs. Therefore,
we improve the consistency by aligning the 2D geometric priors in diffusion
models with well-defined 3D shapes during the lifting, addressing the vast
majority of the problem. This is achieved by fine-tuning the 2D diffusion model
to be viewpoint-aware and to produce view-specific coordinate maps of
canonically oriented 3D objects. In our process, only coarse 3D information is
used for aligning. This "coarse" alignment not only resolves the multi-view
inconsistency in geometries but also retains the ability in 2D diffusion models
to generate detailed and diversified high-quality objects unseen in the 3D
datasets. Furthermore, our aligned geometric priors (AGP) are generic and can
be seamlessly integrated into various state-of-the-art pipelines, obtaining
high generalizability in terms of unseen shapes and visual appearance while
greatly alleviating the multi-view inconsistency problem. Our method represents
a new state-of-the-art performance with an 85+% consistency rate by human
evaluation, while many previous methods are around 30%. Our project page is
https://sweetdreamer3d.github.io/
Related papers
- Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation [27.43973967994717]
MT3D is a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias.
We employ depth maps derived from a high-quality 3D model as control signals to guarantee that the generated 2D images preserve the fundamental shape and structure.
By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometrically consistent objects.
arXiv Detail & Related papers (2024-08-12T06:25:44Z) - DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts.
Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets.
We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - Any2Point: Empowering Any-modality Large Models for Efficient 3D Understanding [83.63231467746598]
We introduce Any2Point, a parameter-efficient method to empower any-modality large models (vision, language, audio) for 3D understanding.
We propose a 3D-to-any (1D or 2D) virtual projection strategy that correlates the input 3D points to the original 1D or 2D positions within the source modality.
arXiv Detail & Related papers (2024-04-11T17:59:45Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - Retrieval-Augmented Score Distillation for Text-to-3D Generation [30.57225047257049]
We introduce novel framework for retrieval-based quality enhancement in text-to-3D generation.
We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency.
arXiv Detail & Related papers (2024-02-05T12:50:30Z) - GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and
Consistent 3D Generation [66.46683554587352]
We present GeoDream, a novel method that incorporates explicit generalized 3D priors with 2D diffusion priors.
Specifically, we first utilize a multi-view diffusion model to generate posed images and then construct cost volume from the predicted image.
We further propose to harness 3D geometric priors to unlock the great potential of 3D awareness in 2D diffusion priors via a disentangled design.
arXiv Detail & Related papers (2023-11-29T15:48:48Z) - Magic123: One Image to High-Quality 3D Object Generation Using Both 2D
and 3D Diffusion Priors [104.79392615848109]
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes from a single unposed image.
In the first stage, we optimize a neural radiance field to produce a coarse geometry.
In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture.
arXiv Detail & Related papers (2023-06-30T17:59:08Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.