PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain
Gap Using Pose-Preserved Text-to-Image Diffusion
- URL: http://arxiv.org/abs/2304.01900v1
- Date: Tue, 4 Apr 2023 15:49:01 GMT
- Title: PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain
Gap Using Pose-Preserved Text-to-Image Diffusion
- Authors: Gwanghyun Kim, Ji Ha Jang, Se Young Chun
- Abstract summary: We propose PODIA-3D, which uses pose-preserved text-to-image diffusion-based domain adaptation for 3D generative models.
We also propose specialized-to-general sampling strategies to improve the details of the generated samples.
Our approach outperforms existing 3D text-guided domain adaptation methods in terms of text-image correspondence, realism, diversity of rendered images, and sense of depth of 3D shapes in the generated samples.
- Score: 15.543034329968465
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Recently, significant advancements have been made in 3D generative models,
however training these models across diverse domains is challenging and
requires an huge amount of training data and knowledge of pose distribution.
Text-guided domain adaptation methods have allowed the generator to be adapted
to the target domains using text prompts, thereby obviating the need for
assembling numerous data. Recently, DATID-3D presents impressive quality of
samples in text-guided domain, preserving diversity in text by leveraging
text-to-image diffusion. However, adapting 3D generators to domains with
significant domain gaps from the source domain still remains challenging due to
issues in current text-to-image diffusion models as following: 1) shape-pose
trade-off in diffusion-based translation, 2) pose bias, and 3) instance bias in
the target domain, resulting in inferior 3D shapes, low text-image
correspondence, and low intra-domain diversity in the generated samples. To
address these issues, we propose a novel pipeline called PODIA-3D, which uses
pose-preserved text-to-image diffusion-based domain adaptation for 3D
generative models. We construct a pose-preserved text-to-image diffusion model
that allows the use of extremely high-level noise for significant domain
changes. We also propose specialized-to-general sampling strategies to improve
the details of the generated samples. Moreover, to overcome the instance bias,
we introduce a text-guided debiasing method that improves intra-domain
diversity. Consequently, our method successfully adapts 3D generators across
significant domain gaps. Our qualitative results and user study demonstrates
that our approach outperforms existing 3D text-guided domain adaptation methods
in terms of text-image correspondence, realism, diversity of rendered images,
and sense of depth of 3D shapes in the generated samples
Related papers
- Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors [26.0337715783954]
DiffusionGAN3D boosts text-guided 3D domain adaptation and generation by combining 3D GANs and diffusion priors.
The proposed framework achieves excellent results in both domain adaptation and text-to-avatar tasks.
arXiv Detail & Related papers (2023-12-28T05:46:26Z) - CAD: Photorealistic 3D Generation via Adversarial Distillation [28.07049413820128]
We propose a novel learning paradigm for 3D synthesis that utilizes pre-trained diffusion models.
Our method unlocks the generation of high-fidelity and photorealistic 3D content conditioned on a single image and prompt.
arXiv Detail & Related papers (2023-12-11T18:59:58Z) - X-Dreamer: Creating High-quality 3D Content by Bridging the Domain Gap Between Text-to-2D and Text-to-3D Generation [61.48050470095969]
X-Dreamer is a novel approach for high-quality text-to-3D content creation.
It bridges the gap between text-to-2D and text-to-3D synthesis.
arXiv Detail & Related papers (2023-11-30T07:23:00Z) - EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior [59.25950280610409]
We propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance.
In this paper, we introduce a novel 2D diffusion model that generates an image consisting of four sub-images based on the given text prompt.
We also present a 3D synthesis network that can further improve the details of the generated 3D contents.
arXiv Detail & Related papers (2023-08-25T07:39:26Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - StyleAvatar3D: Leveraging Image-Text Diffusion Models for High-Fidelity
3D Avatar Generation [103.88928334431786]
We present a novel method for generating high-quality, stylized 3D avatars.
We use pre-trained image-text diffusion models for data generation and a Generative Adversarial Network (GAN)-based 3D generation network for training.
Our approach demonstrates superior performance over current state-of-the-art methods in terms of visual quality and diversity of the produced avatars.
arXiv Detail & Related papers (2023-05-30T13:09:21Z) - Vox-E: Text-guided Voxel Editing of 3D Objects [14.88446525549421]
Large scale text-guided diffusion models have garnered significant attention due to their ability to synthesize diverse images.
We present a technique that harnesses the power of latent diffusion models for editing existing 3D objects.
arXiv Detail & Related papers (2023-03-21T17:36:36Z) - DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image
Diffusion for 3D Generative Model [18.362036050304987]
3D generative models have achieved remarkable performance in synthesizing high resolution photorealistic images with view consistency and detailed 3D shapes.
Text-guided domain adaptation methods have shown impressive performance on converting the 2D generative model on one domain into the models on other domains with different styles.
Here we propose DATID-3D, a domain adaptation method tailored for 3D generative models using text-to-image diffusion models.
arXiv Detail & Related papers (2022-11-29T16:54:34Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.