Debiasing Scores and Prompts of 2D Diffusion for View-consistent
Text-to-3D Generation
- URL: http://arxiv.org/abs/2303.15413v5
- Date: Tue, 19 Dec 2023 22:03:12 GMT
- Title: Debiasing Scores and Prompts of 2D Diffusion for View-consistent
Text-to-3D Generation
- Authors: Susung Hong, Donghoon Ahn, Seungryong Kim
- Abstract summary: We propose two approaches to debias the score-distillation frameworks for view-consistent text-to-3D generation.
One of the most notable issues is the Janus problem, where the most canonical view of an object appears in other views.
Our methods improve the realism of the generated 3D objects by significantly reducing artifacts and achieve a good trade-off between faithfulness to the 2D diffusion models and 3D consistency with little overhead.
- Score: 38.032010026146146
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Existing score-distilling text-to-3D generation techniques, despite their
considerable promise, often encounter the view inconsistency problem. One of
the most notable issues is the Janus problem, where the most canonical view of
an object (\textit{e.g}., face or head) appears in other views. In this work,
we explore existing frameworks for score-distilling text-to-3D generation and
identify the main causes of the view inconsistency problem -- the embedded bias
of 2D diffusion models. Based on these findings, we propose two approaches to
debias the score-distillation frameworks for view-consistent text-to-3D
generation. Our first approach, called score debiasing, involves cutting off
the score estimated by 2D diffusion models and gradually increasing the
truncation value throughout the optimization process. Our second approach,
called prompt debiasing, identifies conflicting words between user prompts and
view prompts using a language model, and adjusts the discrepancy between view
prompts and the viewing direction of an object. Our experimental results show
that our methods improve the realism of the generated 3D objects by
significantly reducing artifacts and achieve a good trade-off between
faithfulness to the 2D diffusion models and 3D consistency with little
overhead. Our project page is available
at~\url{https://susunghong.github.io/Debiased-Score-Distillation-Sampling/}.
Related papers
- Vista3D: Unravel the 3D Darkside of a Single Image [64.00066024235088]
Vista3D is a framework that realizes swift and consistent 3D generation within a mere 5 minutes.
In the coarse phase, we rapidly generate initial geometry with Gaussian Splatting from a single image.
It elevates the quality of generation by using a disentangled representation with two independent implicit functions.
arXiv Detail & Related papers (2024-09-18T17:59:44Z) - VCD-Texture: Variance Alignment based 3D-2D Co-Denoising for Text-Guided Texturing [22.39760469467524]
We propose a Variance texture synthesis to address the modal gap between the 2D and 3D diffusion models.
We present an inpainting module to improve details with conflicting regions.
arXiv Detail & Related papers (2024-07-05T12:11:33Z) - Sculpt3D: Multi-View Consistent Text-to-3D Generation with Sparse 3D Prior [57.986512832738704]
We present a new framework Sculpt3D that equips the current pipeline with explicit injection of 3D priors from retrieved reference objects without re-training the 2D diffusion model.
Specifically, we demonstrate that high-quality and diverse 3D geometry can be guaranteed by keypoints supervision through a sparse ray sampling approach.
These two decoupled designs effectively harness 3D information from reference objects to generate 3D objects while preserving the generation quality of the 2D diffusion model.
arXiv Detail & Related papers (2024-03-14T07:39:59Z) - Retrieval-Augmented Score Distillation for Text-to-3D Generation [30.57225047257049]
We introduce novel framework for retrieval-based quality enhancement in text-to-3D generation.
We conduct extensive experiments to demonstrate that ReDream exhibits superior quality with increased geometric consistency.
arXiv Detail & Related papers (2024-02-05T12:50:30Z) - X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation [61.48050470095969]
X-Dreamer is a novel approach for high-quality text-to-3D content creation.
It bridges the gap between text-to-2D and text-to-3D synthesis.
arXiv Detail & Related papers (2023-11-30T07:23:00Z) - SweetDreamer: Aligning Geometric Priors in 2D Diffusion for Consistent
Text-to-3D [40.088688751115214]
It is inherently ambiguous to lift 2D results from pre-trained diffusion models to a 3D world for text-to-3D generation.
We improve consistency by aligning the 2D geometric priors in diffusion models with well-defined 3D shapes during the lifting.
Our method represents a new state-of-the-art performance with an 85+% consistency rate by human evaluation.
arXiv Detail & Related papers (2023-10-04T05:59:50Z) - 3DDesigner: Towards Photorealistic 3D Object Generation and Editing with
Text-guided Diffusion Models [71.25937799010407]
We equip text-guided diffusion models to achieve 3D-consistent generation.
We study 3D local editing and propose a two-step solution.
We extend our model to perform one-shot novel view synthesis.
arXiv Detail & Related papers (2022-11-25T13:50:00Z) - Homography Loss for Monocular 3D Object Detection [54.04870007473932]
A differentiable loss function, termed as Homography Loss, is proposed to achieve the goal, which exploits both 2D and 3D information.
Our method yields the best performance compared with the other state-of-the-arts by a large margin on KITTI 3D datasets.
arXiv Detail & Related papers (2022-04-02T03:48:03Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.