HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion
Guidance
- URL: http://arxiv.org/abs/2305.18766v4
- Date: Mon, 11 Mar 2024 06:14:31 GMT
- Title: HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion
Guidance
- Authors: Junzhe Zhu and Peiye Zhuang and Sanmi Koyejo
- Abstract summary: This work proposes holistic sampling and smoothing approaches to achieve high-quality text-to-3D generation.
We compute denoising scores in the text-to-image diffusion model's latent and image spaces.
To generate high-quality renderings in a single-stage optimization, we propose regularization for the variance of z-coordinates along NeRF rays.
- Score: 19.252300247300145
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advancements in automatic text-to-3D generation have been remarkable.
Most existing methods use pre-trained text-to-image diffusion models to
optimize 3D representations like Neural Radiance Fields (NeRFs) via
latent-space denoising score matching. Yet, these methods often result in
artifacts and inconsistencies across different views due to their suboptimal
optimization approaches and limited understanding of 3D geometry. Moreover, the
inherent constraints of NeRFs in rendering crisp geometry and stable textures
usually lead to a two-stage optimization to attain high-resolution details.
This work proposes holistic sampling and smoothing approaches to achieve
high-quality text-to-3D generation, all in a single-stage optimization. We
compute denoising scores in the text-to-image diffusion model's latent and
image spaces. Instead of randomly sampling timesteps (also referred to as noise
levels in denoising score matching), we introduce a novel timestep annealing
approach that progressively reduces the sampled timestep throughout
optimization. To generate high-quality renderings in a single-stage
optimization, we propose regularization for the variance of z-coordinates along
NeRF rays. To address texture flickering issues in NeRFs, we introduce a kernel
smoothing technique that refines importance sampling weights coarse-to-fine,
ensuring accurate and thorough sampling in high-density regions. Extensive
experiments demonstrate the superiority of our method over previous approaches,
enabling the generation of highly detailed and view-consistent 3D assets
through a single-stage training process.
Related papers
- DreamPolish: Domain Score Distillation With Progressive Geometry Generation [66.94803919328815]
We introduce DreamPolish, a text-to-3D generation model that excels in producing refined geometry and high-quality textures.
In the geometry construction phase, our approach leverages multiple neural representations to enhance the stability of the synthesis process.
In the texture generation phase, we introduce a novel score distillation objective, namely domain score distillation (DSD), to guide neural representations toward such a domain.
arXiv Detail & Related papers (2024-11-03T15:15:01Z) - MVGaussian: High-Fidelity text-to-3D Content Generation with Multi-View Guidance and Surface Densification [13.872254142378772]
This paper introduces a unified framework for text-to-3D content generation.
Our approach utilizes multi-view guidance to iteratively form the structure of the 3D model.
We also introduce a novel densification algorithm that aligns gaussians close to the surface.
arXiv Detail & Related papers (2024-09-10T16:16:34Z) - OrientDream: Streamlining Text-to-3D Generation with Explicit Orientation Control [66.03885917320189]
OrientDream is a camera orientation conditioned framework for efficient and multi-view consistent 3D generation from textual prompts.
Our strategy emphasizes the implementation of an explicit camera orientation conditioned feature in the pre-training of a 2D text-to-image diffusion module.
Our experiments reveal that our method not only produces high-quality NeRF models with consistent multi-view properties but also achieves an optimization speed significantly greater than existing methods.
arXiv Detail & Related papers (2024-06-14T13:16:18Z) - Improving Robustness for Joint Optimization of Camera Poses and
Decomposed Low-Rank Tensorial Radiance Fields [26.4340697184666]
We propose an algorithm that allows joint refinement of camera pose and scene geometry represented by decomposed low-rank tensor.
We also propose techniques of smoothed 2D supervision, randomly scaled kernel parameters, and edge-guided loss mask.
arXiv Detail & Related papers (2024-02-20T18:59:02Z) - Text-Image Conditioned Diffusion for Consistent Text-to-3D Generation [28.079441901818296]
We propose a text-to-3D method for Neural Radiance Fields (NeRFs) that explicitly enforces fine-grained view consistency.
Our method achieves state-of-the-art performance over existing text-to-3D methods.
arXiv Detail & Related papers (2023-12-19T01:09:49Z) - Learn to Optimize Denoising Scores for 3D Generation: A Unified and
Improved Diffusion Prior on NeRF and 3D Gaussian Splatting [60.393072253444934]
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks.
We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation.
arXiv Detail & Related papers (2023-12-08T03:55:34Z) - StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D [88.66678730537777]
We present StableDreamer, a methodology incorporating three advances.
First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss.
Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
arXiv Detail & Related papers (2023-12-02T02:27:58Z) - AdaDiff: Adaptive Step Selection for Fast Diffusion [88.8198344514677]
We introduce AdaDiff, a framework designed to learn instance-specific step usage policies.
AdaDiff is optimized using a policy gradient method to maximize a carefully designed reward function.
Our approach achieves similar results in terms of visual quality compared to the baseline using a fixed 50 denoising steps.
arXiv Detail & Related papers (2023-11-24T11:20:38Z) - DreamGaussian: Generative Gaussian Splatting for Efficient 3D Content Creation [55.661467968178066]
We propose DreamGaussian, a novel 3D content generation framework that achieves both efficiency and quality simultaneously.
Our key insight is to design a generative 3D Gaussian Splatting model with companioned mesh extraction and texture refinement in UV space.
In contrast to the occupancy pruning used in Neural Radiance Fields, we demonstrate that the progressive densification of 3D Gaussians converges significantly faster for 3D generative tasks.
arXiv Detail & Related papers (2023-09-28T17:55:05Z) - DreamTime: An Improved Optimization Strategy for Diffusion-Guided 3D Generation [24.042803966469066]
We show that the conflict between the 3D optimization process and uniform timestep sampling in score distillation is the main reason for these limitations.
We propose to prioritize timestep sampling with monotonically non-increasing functions, which aligns the 3D optimization process with the sampling process of diffusion model.
Our simple redesign significantly improves 3D content creation with faster convergence, better quality and diversity.
arXiv Detail & Related papers (2023-06-21T17:59:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.