Optimized View and Geometry Distillation from Multi-view Diffuser
- URL: http://arxiv.org/abs/2312.06198v3
- Date: Fri, 8 Mar 2024 07:36:58 GMT
- Title: Optimized View and Geometry Distillation from Multi-view Diffuser
- Authors: Youjia Zhang, Zikai Song, Junqing Yu, Yawei Luo, Wei Yang
- Abstract summary: We introduce an Unbiased Score Distillation (USD) that utilizes unconditioned noises from a 2D diffusion model.
We develop a two-step specialization process of a 2D diffusion model, which is adept at conducting object-specific denoising.
Finally, we recover faithful geometry and texture directly from the refined multi-view images.
- Score: 20.47237377203664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating multi-view images from a single input view using image-conditioned
diffusion models is a recent advancement and has shown considerable potential.
However, issues such as the lack of consistency in synthesized views and
over-smoothing in extracted geometry persist. Previous methods integrate
multi-view consistency modules or impose additional supervisory to enhance view
consistency while compromising on the flexibility of camera positioning and
limiting the versatility of view synthesis. In this study, we consider the
radiance field optimized during geometry extraction as a more rigid consistency
prior, compared to volume and ray aggregation used in previous works. We
further identify and rectify a critical bias in the traditional radiance field
optimization process through score distillation from a multi-view diffuser. We
introduce an Unbiased Score Distillation (USD) that utilizes unconditioned
noises from a 2D diffusion model, greatly refining the radiance field fidelity.
We leverage the rendered views from the optimized radiance field as the basis
and develop a two-step specialization process of a 2D diffusion model, which is
adept at conducting object-specific denoising and generating high-quality
multi-view images. Finally, we recover faithful geometry and texture directly
from the refined multi-view images. Empirical evaluations demonstrate that our
optimized geometry and view distillation technique generates comparable results
to the state-of-the-art models trained on extensive datasets, all while
maintaining freedom in camera positioning. Please see our project page at
https://youjiazhang.github.io/USD/.
Related papers
- Generative Detail Enhancement for Physically Based Materials [25.631270458028066]
We present a tool for enhancing the detail of physically based materials using an off-the-shelf diffusion model and inverse rendering.
Our goal is to enhance the visual fidelity of materials with detail that is often tedious to author, by adding signs of wear, aging, weathering, etc.
arXiv Detail & Related papers (2025-02-19T06:39:51Z) - ConsistentDreamer: View-Consistent Meshes Through Balanced Multi-View Gaussian Optimization [5.55656676725821]
We present ConsistentDreamer, where we first generate a set of fixed multi-view prior images and sample random views between them.
Thereby, we limit the discrepancies between the views guided by the SDS loss and ensure a consistent rough shape.
In each iteration, we also use our generated multi-view prior images for fine-detail reconstruction.
arXiv Detail & Related papers (2025-02-13T12:49:25Z) - Towards High-Fidelity 3D Portrait Generation with Rich Details by Cross-View Prior-Aware Diffusion [63.81544586407943]
Single-image 3D portrait generation methods typically employ 2D diffusion models to provide multi-view knowledge, which is then distilled into 3D representations.
We propose a Hybrid Priors Diffsion model, which explicitly and implicitly incorporates multi-view priors as conditions to enhance the status consistency of the generated multi-view portraits.
Experiments demonstrate that our method can produce 3D portraits with accurate geometry and rich details from a single image.
arXiv Detail & Related papers (2024-11-15T17:19:18Z) - PlacidDreamer: Advancing Harmony in Text-to-3D Generation [20.022078051436846]
PlacidDreamer is a text-to-3D framework that harmonizes multi-view generation and text-conditioned generation.
It employs a novel score distillation algorithm to achieve balanced saturation.
arXiv Detail & Related papers (2024-07-19T02:00:04Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - Sparse3D: Distilling Multiview-Consistent Diffusion for Object
Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs.
Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field.
By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.