GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and
Consistent 3D Generation
- URL: http://arxiv.org/abs/2311.17971v2
- Date: Fri, 1 Dec 2023 01:36:04 GMT
- Title: GeoDream: Disentangling 2D and Geometric Priors for High-Fidelity and
Consistent 3D Generation
- Authors: Baorui Ma, Haoge Deng, Junsheng Zhou, Yu-Shen Liu, Tiejun Huang,
Xinlong Wang
- Abstract summary: We present GeoDream, a novel method that incorporates explicit generalized 3D priors with 2D diffusion priors.
Specifically, we first utilize a multi-view diffusion model to generate posed images and then construct cost volume from the predicted image.
We further propose to harness 3D geometric priors to unlock the great potential of 3D awareness in 2D diffusion priors via a disentangled design.
- Score: 66.46683554587352
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Text-to-3D generation by distilling pretrained large-scale text-to-image
diffusion models has shown great promise but still suffers from inconsistent 3D
geometric structures (Janus problems) and severe artifacts. The aforementioned
problems mainly stem from 2D diffusion models lacking 3D awareness during the
lifting. In this work, we present GeoDream, a novel method that incorporates
explicit generalized 3D priors with 2D diffusion priors to enhance the
capability of obtaining unambiguous 3D consistent geometric structures without
sacrificing diversity or fidelity. Specifically, we first utilize a multi-view
diffusion model to generate posed images and then construct cost volume from
the predicted image, which serves as native 3D geometric priors, ensuring
spatial consistency in 3D space. Subsequently, we further propose to harness 3D
geometric priors to unlock the great potential of 3D awareness in 2D diffusion
priors via a disentangled design. Notably, disentangling 2D and 3D priors
allows us to refine 3D geometric priors further. We justify that the refined 3D
geometric priors aid in the 3D-aware capability of 2D diffusion priors, which
in turn provides superior guidance for the refinement of 3D geometric priors.
Our numerical and visual comparisons demonstrate that GeoDream generates more
3D consistent textured meshes with high-resolution realistic renderings (i.e.,
1024 $\times$ 1024) and adheres more closely to semantic coherence.
Related papers
- Deep Geometric Moments Promote Shape Consistency in Text-to-3D Generation [27.43973967994717]
MT3D is a text-to-3D generative model that leverages a high-fidelity 3D object to overcome viewpoint bias.
We employ depth maps derived from a high-quality 3D model as control signals to guarantee that the generated 2D images preserve the fundamental shape and structure.
By incorporating geometric details from a 3D asset, MT3D enables the creation of diverse and geometrically consistent objects.
arXiv Detail & Related papers (2024-08-12T06:25:44Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - Retrieval-Augmented Score Distillation for Text-to-3D Generation [30.57225047257049]
We introduce novel framework for retrieval-based quality enhancement in text-to-3D generation.
We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency.
arXiv Detail & Related papers (2024-02-05T12:50:30Z) - Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D
Prior [52.44678180286886]
2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data.
We propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously.
arXiv Detail & Related papers (2023-12-11T18:59:18Z) - SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent
Text-to-3D [40.088688751115214]
It is inherently ambiguous to lift 2D results from pre-trained diffusion models to a 3D world for text-to-3D generation.
We improve consistency by aligning the 2D geometric priors in diffusion models with well-defined 3D shapes during the lifting.
Our method represents a new state-of-the-art performance with an 85+% consistency rate by human evaluation.
arXiv Detail & Related papers (2023-10-04T05:59:50Z) - EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior [59.25950280610409]
We propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance.
In this paper, we introduce a novel 2D diffusion model that generates an image consisting of four sub-images based on the given text prompt.
We also present a 3D synthesis network that can further improve the details of the generated 3D contents.
arXiv Detail & Related papers (2023-08-25T07:39:26Z) - Magic123: One Image to High-Quality 3D Object Generation Using Both 2D
and 3D Diffusion Priors [104.79392615848109]
We present Magic123, a two-stage coarse-to-fine approach for high-quality, textured 3D meshes from a single unposed image.
In the first stage, we optimize a neural radiance field to produce a coarse geometry.
In the second stage, we adopt a memory-efficient differentiable mesh representation to yield a high-resolution mesh with a visually appealing texture.
arXiv Detail & Related papers (2023-06-30T17:59:08Z) - XDGAN: Multi-Modal 3D Shape Generation in 2D Space [60.46777591995821]
We propose a novel method to convert 3D shapes into compact 1-channel geometry images and leverage StyleGAN3 and image-to-image translation networks to generate 3D objects in 2D space.
The generated geometry images are quick to convert to 3D meshes, enabling real-time 3D object synthesis, visualization and interactive editing.
We show both quantitatively and qualitatively that our method is highly effective at various tasks such as 3D shape generation, single view reconstruction and shape manipulation, while being significantly faster and more flexible compared to recent 3D generative models.
arXiv Detail & Related papers (2022-10-06T15:54:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.