Novel View Synthesis with Diffusion Models
- URL: http://arxiv.org/abs/2210.04628v1
- Date: Thu, 6 Oct 2022 16:59:56 GMT
- Title: Novel View Synthesis with Diffusion Models
- Authors: Daniel Watson, William Chan, Ricardo Martin-Brualla, Jonathan Ho,
Andrea Tagliasacchi, Mohammad Norouzi
- Abstract summary: We present 3DiM, a diffusion model for 3D novel view synthesis.
It is able to translate a single input view into consistent and sharp completions across many views.
3DiM can generate multiple views that are 3D consistent using a novel technique called conditioning.
- Score: 56.55571338854636
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present 3DiM, a diffusion model for 3D novel view synthesis, which is able
to translate a single input view into consistent and sharp completions across
many views. The core component of 3DiM is a pose-conditional image-to-image
diffusion model, which takes a source view and its pose as inputs, and
generates a novel view for a target pose as output. 3DiM can generate multiple
views that are 3D consistent using a novel technique called stochastic
conditioning. The output views are generated autoregressively, and during the
generation of each novel view, one selects a random conditioning view from the
set of available views at each denoising step. We demonstrate that stochastic
conditioning significantly improves the 3D consistency of a naive sampler for
an image-to-image diffusion model, which involves conditioning on a single
fixed view. We compare 3DiM to prior work on the SRN ShapeNet dataset,
demonstrating that 3DiM's generated completions from a single view achieve much
higher fidelity, while being approximately 3D consistent. We also introduce a
new evaluation methodology, 3D consistency scoring, to measure the 3D
consistency of a generated object by training a neural field on the model's
output views. 3DiM is geometry free, does not rely on hyper-networks or
test-time optimization for novel view synthesis, and allows a single model to
easily scale to a large number of scenes.
Related papers
- 3D-free meets 3D priors: Novel View Synthesis from a Single Image with Pretrained Diffusion Guidance [61.06034736050515]
We introduce a method capable of generating camera-controlled viewpoints from a single input image.
Our method excels in handling complex and diverse scenes without extensive training or additional 3D and multiview data.
arXiv Detail & Related papers (2024-08-12T13:53:40Z) - MVDiff: Scalable and Flexible Multi-View Diffusion for 3D Object Reconstruction from Single-View [0.0]
This paper proposes a general framework to generate consistent multi-view images from single image or leveraging scene representation transformer and view-conditioned diffusion model.
Our model is able to generate 3D meshes surpassing baselines methods in evaluation metrics, including PSNR, SSIM and LPIPS.
arXiv Detail & Related papers (2024-05-06T22:55:53Z) - MVD-Fusion: Single-view 3D via Depth-consistent Multi-view Generation [54.27399121779011]
We present MVD-Fusion: a method for single-view 3D inference via generative modeling of multi-view-consistent RGB-D images.
We show that our approach can yield more accurate synthesis compared to recent state-of-the-art, including distillation-based 3D inference and prior multi-view generation methods.
arXiv Detail & Related papers (2024-04-04T17:59:57Z) - Consistent-1-to-3: Consistent Image to 3D View Synthesis via Geometry-aware Diffusion Models [16.326276673056334]
Consistent-1-to-3 is a generative framework that significantly mitigates this issue.
We decompose the NVS task into two stages: (i) transforming observed regions to a novel view, and (ii) hallucinating unseen regions.
We propose to employ epipolor-guided attention to incorporate geometry constraints, and multi-view attention to better aggregate multi-view information.
arXiv Detail & Related papers (2023-10-04T17:58:57Z) - Sparse3D: Distilling Multiview-Consistent Diffusion for Object
Reconstruction from Sparse Views [47.215089338101066]
We present Sparse3D, a novel 3D reconstruction method tailored for sparse view inputs.
Our approach distills robust priors from a multiview-consistent diffusion model to refine a neural radiance field.
By tapping into 2D priors from powerful image diffusion models, our integrated model consistently delivers high-quality results.
arXiv Detail & Related papers (2023-08-27T11:52:00Z) - Generative Novel View Synthesis with 3D-Aware Diffusion Models [96.78397108732233]
We present a diffusion-based model for 3D-aware generative novel view synthesis from as few as a single input image.
Our method makes use of existing 2D diffusion backbones but, crucially, incorporates geometry priors in the form of a 3D feature volume.
In addition to generating novel views, our method has the ability to autoregressively synthesize 3D-consistent sequences.
arXiv Detail & Related papers (2023-04-05T17:15:47Z) - 3DDesigner: Towards Photorealistic 3D Object Generation and Editing with
Text-guided Diffusion Models [71.25937799010407]
We equip text-guided diffusion models to achieve 3D-consistent generation.
We study 3D local editing and propose a two-step solution.
We extend our model to perform one-shot novel view synthesis.
arXiv Detail & Related papers (2022-11-25T13:50:00Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - AUTO3D: Novel view synthesis through unsupervisely learned variational
viewpoint and global 3D representation [27.163052958878776]
This paper targets on learning-based novel view synthesis from a single or limited 2D images without the pose supervision.
We construct an end-to-end trainable conditional variational framework to disentangle the unsupervisely learned relative-pose/rotation and implicit global 3D representation.
Our system can achieve implicitly 3D understanding without explicitly 3D reconstruction.
arXiv Detail & Related papers (2020-07-13T18:51:27Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.