Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and
Reconstruction
- URL: http://arxiv.org/abs/2304.06714v4
- Date: Fri, 25 Aug 2023 14:36:57 GMT
- Title: Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and
Reconstruction
- Authors: Hansheng Chen, Jiatao Gu, Anpei Chen, Wei Tian, Zhuowen Tu, Lingjie
Liu, Hao Su
- Abstract summary: 3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images.
We present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects.
- Score: 77.69363640021503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: 3D-aware image synthesis encompasses a variety of tasks, such as scene
generation and novel view synthesis from images. Despite numerous task-specific
methods, developing a comprehensive model remains challenging. In this paper,
we present SSDNeRF, a unified approach that employs an expressive diffusion
model to learn a generalizable prior of neural radiance fields (NeRF) from
multi-view images of diverse objects. Previous studies have used two-stage
approaches that rely on pretrained NeRFs as real data to train diffusion
models. In contrast, we propose a new single-stage training paradigm with an
end-to-end objective that jointly optimizes a NeRF auto-decoder and a latent
diffusion model, enabling simultaneous 3D reconstruction and prior learning,
even from sparsely available views. At test time, we can directly sample the
diffusion prior for unconditional generation, or combine it with arbitrary
observations of unseen objects for NeRF reconstruction. SSDNeRF demonstrates
robust results comparable to or better than leading task-specific methods in
unconditional generation and single/sparse-view 3D reconstruction.
Related papers
- DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images.
Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - Template-Free Single-View 3D Human Digitalization with Diffusion-Guided LRM [29.13412037370585]
We present Human-LRM, a diffusion-guided feed-forward model that predicts the implicit field of a human from a single image.
Our method is able to capture human without any template prior, e.g., SMPL, and effectively enhance occluded parts with rich and realistic details.
arXiv Detail & Related papers (2024-01-22T18:08:22Z) - CAD: Photorealistic 3D Generation via Adversarial Distillation [28.07049413820128]
We propose a novel learning paradigm for 3D synthesis that utilizes pre-trained diffusion models.
Our method unlocks the generation of high-fidelity and photorealistic 3D content conditioned on a single image and prompt.
arXiv Detail & Related papers (2023-12-11T18:59:58Z) - Deceptive-NeRF/3DGS: Diffusion-Generated Pseudo-Observations for High-Quality Sparse-View Reconstruction [60.52716381465063]
We introduce Deceptive-NeRF/3DGS to enhance sparse-view reconstruction with only a limited set of input images.
Specifically, we propose a deceptive diffusion model turning noisy images rendered from few-view reconstructions into high-quality pseudo-observations.
Our system progressively incorporates diffusion-generated pseudo-observations into the training image sets, ultimately densifying the sparse input observations by 5 to 10 times.
arXiv Detail & Related papers (2023-05-24T14:00:32Z) - NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from
3D-aware Diffusion [107.67277084886929]
Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input.
We propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test time.
We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views.
arXiv Detail & Related papers (2023-02-20T17:12:00Z) - NeRDi: Single-View NeRF Synthesis with Language-Guided Diffusion as
General Image Priors [24.05480789681139]
We propose NeRDi, a single-view NeRF synthesis framework with general image priors from 2D diffusion models.
We leverage off-the-shelf vision-language models and introduce a two-section language guidance as conditioning inputs to the diffusion model.
We also demonstrate our generalizability in zero-shot NeRF synthesis for in-the-wild images.
arXiv Detail & Related papers (2022-12-06T19:00:07Z) - VMRF: View Matching Neural Radiance Fields [57.93631771072756]
VMRF is an innovative view matching NeRF that enables effective NeRF training without requiring prior knowledge in camera poses or camera pose distributions.
VMRF introduces a view matching scheme, which exploits unbalanced optimal transport to produce a feature transport plan for mapping a rendered image with randomly camera pose to the corresponding real image.
With the feature transport plan as the guidance, a novel pose calibration technique is designed which rectifies the initially randomized camera poses by predicting relative pose between the pair of rendered and real images.
arXiv Detail & Related papers (2022-07-06T12:26:40Z) - NeRF-VAE: A Geometry Aware 3D Scene Generative Model [14.593550382914767]
We propose NeRF-VAE, a 3D scene generative model that incorporates geometric structure via NeRF and differentiable volume rendering.
NeRF-VAE's explicit 3D rendering process contrasts previous generative models with convolution-based rendering.
We show that, once trained, NeRF-VAE is able to infer and render geometrically-consistent scenes from previously unseen 3D environments.
arXiv Detail & Related papers (2021-04-01T16:16:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.