DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image
Diffusion for 3D Generative Model
- URL: http://arxiv.org/abs/2211.16374v2
- Date: Fri, 31 Mar 2023 02:15:49 GMT
- Title: DATID-3D: Diversity-Preserved Domain Adaptation Using Text-to-Image
Diffusion for 3D Generative Model
- Authors: Gwanghyun Kim and Se Young Chun
- Abstract summary: 3D generative models have achieved remarkable performance in synthesizing high resolution photorealistic images with view consistency and detailed 3D shapes.
Text-guided domain adaptation methods have shown impressive performance on converting the 2D generative model on one domain into the models on other domains with different styles.
Here we propose DATID-3D, a domain adaptation method tailored for 3D generative models using text-to-image diffusion models.
- Score: 18.362036050304987
- License: http://creativecommons.org/licenses/by-nc-nd/4.0/
- Abstract: Recent 3D generative models have achieved remarkable performance in
synthesizing high resolution photorealistic images with view consistency and
detailed 3D shapes, but training them for diverse domains is challenging since
it requires massive training images and their camera distribution information.
Text-guided domain adaptation methods have shown impressive performance on
converting the 2D generative model on one domain into the models on other
domains with different styles by leveraging the CLIP (Contrastive
Language-Image Pre-training), rather than collecting massive datasets for those
domains. However, one drawback of them is that the sample diversity in the
original generative model is not well-preserved in the domain-adapted
generative models due to the deterministic nature of the CLIP text encoder.
Text-guided domain adaptation will be even more challenging for 3D generative
models not only because of catastrophic diversity loss, but also because of
inferior text-image correspondence and poor image quality. Here we propose
DATID-3D, a domain adaptation method tailored for 3D generative models using
text-to-image diffusion models that can synthesize diverse images per text
prompt without collecting additional images and camera information for the
target domain. Unlike 3D extensions of prior text-guided domain adaptation
methods, our novel pipeline was able to fine-tune the state-of-the-art 3D
generator of the source domain to synthesize high resolution, multi-view
consistent images in text-guided targeted domains without additional data,
outperforming the existing text-guided domain adaptation methods in diversity
and text-image correspondence. Furthermore, we propose and demonstrate diverse
3D image manipulations such as one-shot instance-selected adaptation and
single-view manipulated 3D reconstruction to fully enjoy diversity in text.
Related papers
- DreamPolish: Domain Score Distillation With Progressive Geometry Generation [66.94803919328815]
We introduce DreamPolish, a text-to-3D generation model that excels in producing refined geometry and high-quality textures.
In the geometry construction phase, our approach leverages multiple neural representations to enhance the stability of the synthesis process.
In the texture generation phase, we introduce a novel score distillation objective, namely domain score distillation (DSD), to guide neural representations toward such a domain.
arXiv Detail & Related papers (2024-11-03T15:15:01Z) - 3D-VirtFusion: Synthetic 3D Data Augmentation through Generative Diffusion Models and Controllable Editing [52.68314936128752]
We propose a new paradigm to automatically generate 3D labeled training data by harnessing the power of pretrained large foundation models.
For each target semantic class, we first generate 2D images of a single object in various structure and appearance via diffusion models and chatGPT generated text prompts.
We transform these augmented images into 3D objects and construct virtual scenes by random composition.
arXiv Detail & Related papers (2024-08-25T09:31:22Z) - Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - Make-Your-3D: Fast and Consistent Subject-Driven 3D Content Generation [12.693847842218604]
We introduce a novel 3D customization method, dubbed Make-Your-3D, that can personalize high-fidelity and consistent 3D content within 5 minutes.
Our key insight is to harmonize the distributions of a multi-view diffusion model and an identity-specific 2D generative model, aligning them with the distribution of the desired 3D subject.
Our method can produce high-quality, consistent, and subject-specific 3D content with text-driven modifications that are unseen in subject image.
arXiv Detail & Related papers (2024-03-14T17:57:04Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaptation by Combining 3D GANs and Diffusion Priors [26.0337715783954]
DiffusionGAN3D boosts text-guided 3D domain adaptation and generation by combining 3D GANs and diffusion priors.
The proposed framework achieves excellent results in both domain adaptation and text-to-avatar tasks.
arXiv Detail & Related papers (2023-12-28T05:46:26Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - PODIA-3D: Domain Adaptation of 3D Generative Model Across Large Domain
Gap Using Pose-Preserved Text-to-Image Diffusion [15.543034329968465]
We propose PODIA-3D, which uses pose-preserved text-to-image diffusion-based domain adaptation for 3D generative models.
We also propose specialized-to-general sampling strategies to improve the details of the generated samples.
Our approach outperforms existing 3D text-guided domain adaptation methods in terms of text-image correspondence, realism, diversity of rendered images, and sense of depth of 3D shapes in the generated samples.
arXiv Detail & Related papers (2023-04-04T15:49:01Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.