Optimized View and Geometry Distillation from Multi-view Diffuser
- URL: http://arxiv.org/abs/2312.06198v3
- Date: Fri, 8 Mar 2024 07:36:58 GMT
- Title: Optimized View and Geometry Distillation from Multi-view Diffuser
- Authors: Youjia Zhang, Zikai Song, Junqing Yu, Yawei Luo, Wei Yang
- Abstract summary: We introduce an Unbiased Score Distillation (USD) that utilizes unconditioned noises from a 2D diffusion model.
We develop a two-step specialization process of a 2D diffusion model, which is adept at conducting object-specific denoising.
Finally, we recover faithful geometry and texture directly from the refined multi-view images.
- Score: 20.47237377203664
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating multi-view images from a single input view using image-conditioned
diffusion models is a recent advancement and has shown considerable potential.
However, issues such as the lack of consistency in synthesized views and
over-smoothing in extracted geometry persist. Previous methods integrate
multi-view consistency modules or impose additional supervisory to enhance view
consistency while compromising on the flexibility of camera positioning and
limiting the versatility of view synthesis. In this study, we consider the
radiance field optimized during geometry extraction as a more rigid consistency
prior, compared to volume and ray aggregation used in previous works. We
further identify and rectify a critical bias in the traditional radiance field
optimization process through score distillation from a multi-view diffuser. We
introduce an Unbiased Score Distillation (USD) that utilizes unconditioned
noises from a 2D diffusion model, greatly refining the radiance field fidelity.
We leverage the rendered views from the optimized radiance field as the basis
and develop a two-step specialization process of a 2D diffusion model, which is
adept at conducting object-specific denoising and generating high-quality
multi-view images. Finally, we recover faithful geometry and texture directly
from the refined multi-view images. Empirical evaluations demonstrate that our
optimized geometry and view distillation technique generates comparable results
to the state-of-the-art models trained on extensive datasets, all while
maintaining freedom in camera positioning. Please see our project page at
https://youjiazhang.github.io/USD/.
Related papers
- PlacidDreamer: Advancing Harmony in Text-to-3D Generation [20.022078051436846]
PlacidDreamer is a text-to-3D framework that harmonizes multi-view generation and text-conditioned generation.
It employs a novel score distillation algorithm to achieve balanced saturation.
arXiv Detail & Related papers (2024-07-19T02:00:04Z) - MultiDiff: Consistent Novel View Synthesis from a Single Image [60.04215655745264]
MultiDiff is a novel approach for consistent novel view synthesis of scenes from a single RGB image.
Our results demonstrate that MultiDiff outperforms state-of-the-art methods on the challenging, real-world datasets RealEstate10K and ScanNet.
arXiv Detail & Related papers (2024-06-26T17:53:51Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - Multi-View Unsupervised Image Generation with Cross Attention Guidance [23.07929124170851]
This paper introduces a novel pipeline for unsupervised training of a pose-conditioned diffusion model on single-category datasets.
We identify object poses by clustering the dataset through comparing visibility and locations of specific object parts.
Our model, MIRAGE, surpasses prior work in novel view synthesis on real images.
arXiv Detail & Related papers (2023-12-07T14:55:13Z) - Curved Diffusion: A Generative Model With Optical Geometry Control [56.24220665691974]
The influence of different optical systems on the final scene appearance is frequently overlooked.
This study introduces a framework that intimately integrates a textto-image diffusion model with the particular lens used in image rendering.
arXiv Detail & Related papers (2023-11-29T13:06:48Z) - Sparse3D: Distilling Multiview-Consistent Diffusion for Object
Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs.
Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field.
By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.