OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control
- URL: http://arxiv.org/abs/2406.10000v1
- Date: Fri, 14 Jun 2024 13:16:18 GMT
- Title: OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control
- Authors: Yuzhong Huang, Zhong Li, Zhang Chen, Zhiyuan Ren, Guosheng Lin, Fred Morstatter, Yi Xu,
- Abstract summary: OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
- Score: 66.03885917320189
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In the evolving landscape of text-to-3D technology, Dreamfusion has showcased its proficiency by utilizing Score Distillation Sampling (SDS) to optimize implicit representations such as NeRF. This process is achieved through the distillation of pretrained large-scale text-to-image diffusion models. However, Dreamfusion encounters fidelity and efficiency constraints: it faces the multi-head Janus issue and exhibits a relatively slow optimization process. To circumvent these challenges, we introduce OrientDream, a camera orientation conditioned framework designed for efficient and multi-view consistent 3D generation from textual prompts. Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module. This feature effectively utilizes data from MVImgNet, an extensive external multi-view dataset, to refine and bolster its functionality. Subsequently, we utilize the pre-conditioned 2D images as a basis for optimizing a randomly initialized implicit representation (NeRF). This process is significantly expedited by a decoupled back-propagation technique, allowing for multiple updates of implicit parameters per optimization cycle. Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods, as quantified by comparative metrics.
Related papers
- Grounded Compositional and Diverse Text-to-3D with Pretrained Multi-View Diffusion Model [65.58911408026748]
We propose Grounded-Dreamer to generate 3D assets that can accurately follow complex, compositional text prompts.
We first advocate leveraging text-guided 4-view images as the bottleneck in the text-to-3D pipeline.
We then introduce an attention refocusing mechanism to encourage text-aligned 4-view image generation.
arXiv Detail & Related papers (2024-04-28T04:05:10Z) - DreamFlow: High-Quality Text-to-3D Generation by Approximating Probability Flow [72.9209434105892]
We propose to enhance the text-to-3D optimization by leveraging the T2I diffusion prior in the generative sampling process with a predetermined timestep schedule.
By leveraging the proposed novel optimization algorithm, we design DreamFlow, a practical three-stage coarseto-fine text-to-3D optimization framework.
arXiv Detail & Related papers (2024-03-22T05:38:15Z) - Text-Image Conditioned Diffusion for Consistent Text-to-3D Generation [28.079441901818296]
We propose a text-to-3D method for Neural Radiance Fields (NeRFs) that explicitly enforces fine-grained view consistency.
Our method achieves state-of-the-art performance over existing text-to-3D methods.
arXiv Detail & Related papers (2023-12-19T01:09:49Z) - Instant3D: Fast Text-to-3D with Sparse-View Generation and Large
Reconstruction Model [68.98311213582949]
We propose Instant3D, a novel method that generates high-quality and diverse 3D assets from text prompts in a feed-forward manner.
Our method can generate diverse 3D assets of high visual quality within 20 seconds, two orders of magnitude faster than previous optimization-based methods.
arXiv Detail & Related papers (2023-11-10T18:03:44Z) - Wonder3D: Single Image to 3D using Cross-Domain Diffusion [105.16622018766236]
Wonder3D is a novel method for efficiently generating high-fidelity textured meshes from single-view images.
To holistically improve the quality, consistency, and efficiency of image-to-3D tasks, we propose a cross-domain diffusion model.
arXiv Detail & Related papers (2023-10-23T15:02:23Z) - Guide3D: Create 3D Avatars from Text and Image Guidance [55.71306021041785]
Guide3D is a text-and-image-guided generative model for 3D avatar generation based on diffusion models.
Our framework produces topologically and structurally correct geometry and high-resolution textures.
arXiv Detail & Related papers (2023-08-18T17:55:47Z) - DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation [24.042803966469066]
We show that the conflict between the 3D optimization process and uniform timestep sampling in score distillation is the main reason for these limitations.
We propose to prioritize timestep sampling with monotonically non-increasing functions, which aligns the 3D optimization process with the sampling process of diffusion model.
Our simple redesign significantly improves 3D content creation with faster convergence, better quality and diversity.
arXiv Detail & Related papers (2023-06-21T17:59:45Z) - Efficient Text-Guided 3D-Aware Portrait Generation with Score
Distillation Sampling on Distribution [28.526714129927093]
We propose DreamPortrait, which aims to generate text-guided 3D-aware portraits in a single-forward pass for efficiency.
We further design a 3D-aware gated cross-attention mechanism to explicitly let the model perceive the correspondence between the text and the 3D-aware space.
arXiv Detail & Related papers (2023-06-03T11:08:38Z) - HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion
Guidance [19.252300247300145]
This work proposes holistic sampling and smoothing approaches to achieve high-quality text-to-3D generation.
We compute denoising scores in the text-to-image diffusion model's latent and image spaces.
To generate high-quality renderings in a single-stage optimization, we propose regularization for the variance of z-coordinates along NeRF rays.
arXiv Detail & Related papers (2023-05-30T05:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.