Text-Image Conditioned Diffusion for Consistent Text-to-3D Generation
- URL: http://arxiv.org/abs/2312.11774v1
- Date: Tue, 19 Dec 2023 01:09:49 GMT
- Title: Text-Image Conditioned Diffusion for Consistent Text-to-3D Generation
- Authors: Yuze He, Yushi Bai, Matthieu Lin, Jenny Sheng, Yubin Hu, Qi Wang,
Yu-Hui Wen, Yong-Jin Liu
- Abstract summary: We propose a text-to-3D method for Neural Radiance Fields (NeRFs) that explicitly enforces fine-grained view consistency.
Our method achieves state-of-the-art performance over existing text-to-3D methods.
- Score: 28.079441901818296
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: By lifting the pre-trained 2D diffusion models into Neural Radiance Fields
(NeRFs), text-to-3D generation methods have made great progress. Many
state-of-the-art approaches usually apply score distillation sampling (SDS) to
optimize the NeRF representations, which supervises the NeRF optimization with
pre-trained text-conditioned 2D diffusion models such as Imagen. However, the
supervision signal provided by such pre-trained diffusion models only depends
on text prompts and does not constrain the multi-view consistency. To inject
the cross-view consistency into diffusion priors, some recent works finetune
the 2D diffusion model with multi-view data, but still lack fine-grained view
coherence. To tackle this challenge, we incorporate multi-view image conditions
into the supervision signal of NeRF optimization, which explicitly enforces
fine-grained view consistency. With such stronger supervision, our proposed
text-to-3D method effectively mitigates the generation of floaters (due to
excessive densities) and completely empty spaces (due to insufficient
densities). Our quantitative evaluations on the T$^3$Bench dataset demonstrate
that our method achieves state-of-the-art performance over existing text-to-3D
methods. We will make the code publicly available.
Related papers
- OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances.
First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss.
Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z) - IT3D: Improved Text-to-3D Generation with Explicit View Synthesis [71.68595192524843]
This study presents a novel strategy that leverages explicitly synthesized multi-view images to address these issues.
Our approach involves the utilization of image-to-image pipelines, empowered by LDMs, to generate posed high-quality images.
For the incorporated discriminator, the synthesized multi-view images are considered real data, while the renderings of the optimized 3D models function as fake data.
arXiv Detail & Related papers (2023-08-22T14:39:17Z) - HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion
Guidance [19.252300247300145]
This work proposes holistic sampling and smoothing approaches to achieve high-quality text-to-3D generation.
We compute denoising scores in the text-to-image diffusion model's latent and image spaces.
To generate high-quality renderings in a single-stage optimization, we propose regularization for the variance of z-coordinates along NeRF rays.
arXiv Detail & Related papers (2023-05-30T05:56:58Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - Let 2D Diffusion Model Know 3D-Consistency for Robust Text-to-3D
Generation [39.50894560861625]
3DFuse is a novel framework that incorporates 3D awareness into pretrained 2D diffusion models.
We introduce a training strategy that enables the 2D diffusion model learns to handle the errors and sparsity within the coarse 3D structure for robust generation.
arXiv Detail & Related papers (2023-03-14T14:24:31Z) - DreamFusion: Text-to-3D using 2D Diffusion [52.52529213936283]
Recent breakthroughs in text-to-image synthesis have been driven by diffusion models trained on billions of image-text pairs.
In this work, we circumvent these limitations by using a pretrained 2D text-to-image diffusion model to perform text-to-3D synthesis.
Our approach requires no 3D training data and no modifications to the image diffusion model, demonstrating the effectiveness of pretrained image diffusion models as priors.
arXiv Detail & Related papers (2022-09-29T17:50:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.