StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D
- URL: http://arxiv.org/abs/2312.02189v1
- Date: Sat, 2 Dec 2023 02:27:58 GMT
- Title: StableDreamer: Taming Noisy Score Distillation Sampling for Text-to-3D
- Authors: Pengsheng Guo, Hans Hao, Adam Caccavale, Zhongzheng Ren, Edward Zhang,
Qi Shan, Aditya Sankar, Alexander G. Schwing, Alex Colburn, Fangchang Ma
- Abstract summary: We present StableDreamer, a methodology incorporating three advances.
First, we formalize the equivalence of the SDS generative prior and a simple supervised L2 reconstruction loss.
Second, our analysis shows that while image-space diffusion contributes to geometric precision, latent-space diffusion is crucial for vivid color rendition.
- Score: 88.66678730537777
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In the realm of text-to-3D generation, utilizing 2D diffusion models through
score distillation sampling (SDS) frequently leads to issues such as blurred
appearances and multi-faced geometry, primarily due to the intrinsically noisy
nature of the SDS loss. Our analysis identifies the core of these challenges as
the interaction among noise levels in the 2D diffusion process, the
architecture of the diffusion network, and the 3D model representation. To
overcome these limitations, we present StableDreamer, a methodology
incorporating three advances. First, inspired by InstructNeRF2NeRF, we
formalize the equivalence of the SDS generative prior and a simple supervised
L2 reconstruction loss. This finding provides a novel tool to debug SDS, which
we use to show the impact of time-annealing noise levels on reducing
multi-faced geometries. Second, our analysis shows that while image-space
diffusion contributes to geometric precision, latent-space diffusion is crucial
for vivid color rendition. Based on this observation, StableDreamer introduces
a two-stage training strategy that effectively combines these aspects,
resulting in high-fidelity 3D models. Third, we adopt an anisotropic 3D
Gaussians representation, replacing Neural Radiance Fields (NeRFs), to enhance
the overall quality, reduce memory usage during training, and accelerate
rendering speeds, and better capture semi-transparent objects. StableDreamer
reduces multi-face geometries, generates fine details, and converges stably.
Related papers
- FlowDreamer: Exploring High Fidelity Text-to-3D Generation via Rectified Flow [17.919092916953183]
We propose a novel framework, named FlowDreamer, which yields high fidelity results with richer textual details and faster convergence.
Key insight is to leverage the coupling and reversible properties of the rectified flow model to search for the corresponding noise.
We introduce a novel Unique Matching Couple (UCM) loss, which guides the 3D model to optimize along the same trajectory.
arXiv Detail & Related papers (2024-08-09T11:40:20Z) - VividDreamer: Towards High-Fidelity and Efficient Text-to-3D Generation [69.68568248073747]
We propose Pose-dependent Consistency Distillation Sampling (PCDS), a novel yet efficient objective for diffusion-based 3D generation tasks.
PCDS builds the pose-dependent consistency function within diffusion trajectories, allowing to approximate true gradients through minimal sampling steps.
For efficient generation, we propose a coarse-to-fine optimization strategy, which first utilizes 1-step PCDS to create the basic structure of 3D objects, and then gradually increases PCDS steps to generate fine-grained details.
arXiv Detail & Related papers (2024-06-21T08:21:52Z) - CVT-xRF: Contrastive In-Voxel Transformer for 3D Consistent Radiance Fields from Sparse Inputs [65.80187860906115]
We propose a novel approach to improve NeRF's performance with sparse inputs.
We first adopt a voxel-based ray sampling strategy to ensure that the sampled rays intersect with a certain voxel in 3D space.
We then randomly sample additional points within the voxel and apply a Transformer to infer the properties of other points on each ray, which are then incorporated into the volume rendering.
arXiv Detail & Related papers (2024-03-25T15:56:17Z) - UniDream: Unifying Diffusion Priors for Relightable Text-to-3D Generation [101.2317840114147]
We present UniDream, a text-to-3D generation framework by incorporating unified diffusion priors.
Our approach consists of three main components: (1) a dual-phase training process to get albedo-normal aligned multi-view diffusion and reconstruction models, (2) a progressive generation procedure for geometry and albedo-textures based on Score Distillation Sample (SDS) using the trained reconstruction and diffusion models, and (3) an innovative application of SDS for finalizing PBR generation while keeping a fixed albedo based on Stable Diffusion model.
arXiv Detail & Related papers (2023-12-14T09:07:37Z) - Learn to Optimize Denoising Scores for 3D Generation: A Unified and
Improved Diffusion Prior on NeRF and 3D Gaussian Splatting [60.393072253444934]
We propose a unified framework aimed at enhancing the diffusion priors for 3D generation tasks.
We identify a divergence between the diffusion priors and the training procedures of diffusion models that substantially impairs the quality of 3D generation.
arXiv Detail & Related papers (2023-12-08T03:55:34Z) - NeuSD: Surface Completion with Multi-View Text-to-Image Diffusion [56.98287481620215]
We present a novel method for 3D surface reconstruction from multiple images where only a part of the object of interest is captured.
Our approach builds on two recent developments: surface reconstruction using neural radiance fields for the reconstruction of the visible parts of the surface, and guidance of pre-trained 2D diffusion models in the form of Score Distillation Sampling (SDS) to complete the shape in unobserved regions in a plausible manner.
arXiv Detail & Related papers (2023-12-07T19:30:55Z) - Sparse3D: Distilling Multiview-Consistent Diffusion for Object
Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs.
Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field.
By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z) - HiFA: High-fidelity Text-to-3D Generation with Advanced Diffusion
Guidance [19.252300247300145]
This work proposes holistic sampling and smoothing approaches to achieve high-quality text-to-3D generation.
We compute denoising scores in the text-to-image diffusion model's latent and image spaces.
To generate high-quality renderings in a single-stage optimization, we propose regularization for the variance of z-coordinates along NeRF rays.
arXiv Detail & Related papers (2023-05-30T05:56:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.