Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D
Prior
- URL: http://arxiv.org/abs/2312.06655v1
- Date: Mon, 11 Dec 2023 18:59:18 GMT
- Title: Sherpa3D: Boosting High-Fidelity Text-to-3D Generation via Coarse 3D
Prior
- Authors: Fangfu Liu, Diankun Wu, Yi Wei, Yongming Rao, Yueqi Duan
- Abstract summary: 2D diffusion models find a distillation approach that achieves excellent generalization and rich details without any 3D data.
We propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity, generalizability, and geometric consistency simultaneously.
- Score: 52.44678180286886
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recently, 3D content creation from text prompts has demonstrated remarkable
progress by utilizing 2D and 3D diffusion models. While 3D diffusion models
ensure great multi-view consistency, their ability to generate high-quality and
diverse 3D assets is hindered by the limited 3D data. In contrast, 2D diffusion
models find a distillation approach that achieves excellent generalization and
rich details without any 3D data. However, 2D lifting methods suffer from
inherent view-agnostic ambiguity thereby leading to serious multi-face Janus
issues, where text prompts fail to provide sufficient guidance to learn
coherent 3D results. Instead of retraining a costly viewpoint-aware model, we
study how to fully exploit easily accessible coarse 3D knowledge to enhance the
prompts and guide 2D lifting optimization for refinement. In this paper, we
propose Sherpa3D, a new text-to-3D framework that achieves high-fidelity,
generalizability, and geometric consistency simultaneously. Specifically, we
design a pair of guiding strategies derived from the coarse 3D prior generated
by the 3D diffusion model: a structural guidance for geometric fidelity and a
semantic guidance for 3D coherence. Employing the two types of guidance, the 2D
diffusion model enriches the 3D content with diversified and high-quality
results. Extensive experiments show the superiority of our Sherpa3D over the
state-of-the-art text-to-3D methods in terms of quality and 3D consistency.
Related papers
- DIRECT-3D: Learning Direct Text-to-3D Generation on Massive Noisy 3D Data [50.164670363633704]
We present DIRECT-3D, a diffusion-based 3D generative model for creating high-quality 3D assets from text prompts.
Our model is directly trained on extensive noisy and unaligned in-the-wild' 3D assets.
We achieve state-of-the-art performance in both single-class generation and text-to-3D generation.
arXiv Detail & Related papers (2024-06-06T17:58:15Z) - DiffTF++: 3D-aware Diffusion Transformer for Large-Vocabulary 3D Generation [53.20147419879056]
We introduce a diffusion-based feed-forward framework to address challenges with a single model.
Building upon our 3D-aware Diffusion model with TransFormer, we propose a stronger version for 3D generation, i.e., DiffTF++.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate the effectiveness of our proposed modules.
arXiv Detail & Related papers (2024-05-13T17:59:51Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - Retrieval-Augmented Score Distillation for Text-to-3D Generation [30.57225047257049]
We introduce novel framework for retrieval-based quality enhancement in text-to-3D generation.
We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency.
arXiv Detail & Related papers (2024-02-05T12:50:30Z) - BoostDream: Efficient Refining for High-Quality Text-to-3D Generation from Multi-View Diffusion [0.0]
BoostDream is a highly efficient plug-and-play 3D refining method designed to transform coarse 3D assets into high-quality.
We introduce 3D model distillation that fits differentiable representations from the 3D assets obtained through feed-forward generation.
A novel multi-view SDS loss is designed, which utilizes a multi-view aware 2D diffusion model to refine the 3D assets.
arXiv Detail & Related papers (2024-01-30T05:59:00Z) - Large-Vocabulary 3D Diffusion Model with Transformer [57.076986347047]
We introduce a diffusion-based feed-forward framework for synthesizing massive categories of real-world 3D objects with a single generative model.
We propose a novel triplane-based 3D-aware Diffusion model with TransFormer, DiffTF, for handling challenges via three aspects.
Experiments on ShapeNet and OmniObject3D convincingly demonstrate that a single DiffTF model achieves state-of-the-art large-vocabulary 3D object generation performance.
arXiv Detail & Related papers (2023-09-14T17:59:53Z) - EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior [59.25950280610409]
We propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance.
In this paper, we introduce a novel 2D diffusion model that generates an image consisting of four sub-images based on the given text prompt.
We also present a 3D synthesis network that can further improve the details of the generated 3D contents.
arXiv Detail & Related papers (2023-08-25T07:39:26Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.