IT3D: Improved Text-to-3D Generation with Explicit View Synthesis
- URL: http://arxiv.org/abs/2308.11473v1
- Date: Tue, 22 Aug 2023 14:39:17 GMT
- Title: IT3D: Improved Text-to-3D Generation with Explicit View Synthesis
- Authors: Yiwen Chen, Chi Zhang, Xiaofeng Yang, Zhongang Cai, Gang Yu, Lei Yang,
Guosheng Lin
- Abstract summary: This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
- Score: 71.68595192524843
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent strides in Text-to-3D techniques have been propelled by distilling
knowledge from powerful large text-to-image diffusion models (LDMs).
Nonetheless, existing Text-to-3D approaches often grapple with challenges such
as over-saturation, inadequate detailing, and unrealistic outputs. This study
presents a novel strategy that leverages explicitly synthesized multi-view
images to address these issues. Our approach involves the utilization of
image-to-image pipelines, empowered by LDMs, to generate posed high-quality
images based on the renderings of coarse 3D models. Although the generated
images mostly alleviate the aforementioned issues, challenges such as view
inconsistency and significant content variance persist due to the inherent
generative nature of large diffusion models, posing extensive difficulties in
leveraging these images effectively. To overcome this hurdle, we advocate
integrating a discriminator alongside a novel Diffusion-GAN dual training
strategy to guide the training of 3D models. For the incorporated
discriminator, the synthesized multi-view images are considered real data,
while the renderings of the optimized 3D models function as fake data. We
conduct a comprehensive set of experiments that demonstrate the effectiveness
of our method over baseline approaches.
Related papers
- Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - ViewDiff: 3D-Consistent Image Generation with Text-to-Image Models [65.22994156658918]
We present a method that learns to generate multi-view images in a single denoising process from real-world data.
We design an autoregressive generation that renders more 3D-consistent images at any viewpoint.
arXiv Detail & Related papers (2024-03-04T07:57:05Z) - VolumeDiffusion: Flexible Text-to-3D Generation with Efficient Volumetric Encoder [56.59814904526965]
This paper introduces a pioneering 3D encoder designed for text-to-3D generation.
A lightweight network is developed to efficiently acquire feature volumes from multi-view images.
The 3D volumes are then trained on a diffusion model for text-to-3D generation using a 3D U-Net.
arXiv Detail & Related papers (2023-12-18T18:59:05Z) - CAD: Photorealistic 3D Generation via Adversarial Distillation [28.07049413820128]
We propose a novel learning paradigm for 3D synthesis that utilizes pre-trained diffusion models.
Our method unlocks the generation of high-fidelity and photorealistic 3D content conditioned on a single image and prompt.
arXiv Detail & Related papers (2023-12-11T18:59:58Z) - EfficientDreamer: High-Fidelity and Robust 3D Creation via Orthogonal-view Diffusion Prior [59.25950280610409]
We propose a robust high-quality 3D content generation pipeline by exploiting orthogonal-view image guidance.
In this paper, we introduce a novel 2D diffusion model that generates an image consisting of four sub-images based on the given text prompt.
We also present a 3D synthesis network that can further improve the details of the generated 3D contents.
arXiv Detail & Related papers (2023-08-25T07:39:26Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - Vox-E: Text-guided Voxel Editing of 3D Objects [14.88446525549421]
Large scale text-guided diffusion models have garnered significant attention due to their ability to synthesize diverse images.
We present a technique that harnesses the power of latent diffusion models for editing existing 3D objects.
arXiv Detail & Related papers (2023-03-21T17:36:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.