Denoising Diffusion via Image-Based Rendering
- URL: http://arxiv.org/abs/2402.03445v2
- Date: Tue, 20 Feb 2024 20:59:48 GMT
- Title: Denoising Diffusion via Image-Based Rendering
- Authors: Titas Anciukevi\v{c}ius, Fabian Manhardt, Federico Tombari, Paul
Henderson
- Abstract summary: We introduce the first diffusion model able to perform fast, detailed reconstruction and generation of real-world 3D scenes.
First, we introduce a new neural scene representation, IB-planes, that can efficiently and accurately represent large 3D scenes.
Second, we propose a denoising-diffusion framework to learn a prior over this novel 3D scene representation, using only 2D images.
- Score: 54.20828696348574
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generating 3D scenes is a challenging open problem, which requires
synthesizing plausible content that is fully consistent in 3D space. While
recent methods such as neural radiance fields excel at view synthesis and 3D
reconstruction, they cannot synthesize plausible details in unobserved regions
since they lack a generative capability. Conversely, existing generative
methods are typically not capable of reconstructing detailed, large-scale
scenes in the wild, as they use limited-capacity 3D scene representations,
require aligned camera poses, or rely on additional regularizers. In this work,
we introduce the first diffusion model able to perform fast, detailed
reconstruction and generation of real-world 3D scenes. To achieve this, we make
three contributions. First, we introduce a new neural scene representation,
IB-planes, that can efficiently and accurately represent large 3D scenes,
dynamically allocating more capacity as needed to capture details visible in
each image. Second, we propose a denoising-diffusion framework to learn a prior
over this novel 3D scene representation, using only 2D images without the need
for any additional supervision signal such as masks or depths. This supports 3D
reconstruction and generation in a unified architecture. Third, we develop a
principled approach to avoid trivial 3D solutions when integrating the
image-based rendering with the diffusion model, by dropping out representations
of some images. We evaluate the model on several challenging datasets of real
and synthetic images, and demonstrate superior results on generation, novel
view synthesis and 3D reconstruction.
Related papers
- ReconX: Reconstruct Any Scene from Sparse Views with Video Diffusion Model [16.14713604672497]
ReconX is a novel 3D scene reconstruction paradigm that reframes the ambiguous reconstruction challenge as a temporal generation task.
The proposed ReconX first constructs a global point cloud and encodes it into a contextual space as the 3D structure condition.
Guided by the condition, the video diffusion model then synthesizes video frames that are both detail-preserved and exhibit a high degree of 3D consistency.
arXiv Detail & Related papers (2024-08-29T17:59:40Z) - DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - 3D-SceneDreamer: Text-Driven 3D-Consistent Scene Generation [51.64796781728106]
We propose a generative refinement network to synthesize new contents with higher quality by exploiting the natural image prior to 2D diffusion model and the global 3D information of the current scene.
Our approach supports wide variety of scene generation and arbitrary camera trajectories with improved visual quality and 3D consistency.
arXiv Detail & Related papers (2024-03-14T14:31:22Z) - ARTIC3D: Learning Robust Articulated 3D Shapes from Noisy Web Image
Collections [71.46546520120162]
Estimating 3D articulated shapes like animal bodies from monocular images is inherently challenging.
We propose ARTIC3D, a self-supervised framework to reconstruct per-instance 3D shapes from a sparse image collection in-the-wild.
We produce realistic animations by fine-tuning the rendered shape and texture under rigid part transformations.
arXiv Detail & Related papers (2023-06-07T17:47:50Z) - RenderDiffusion: Image Diffusion for 3D Reconstruction, Inpainting and
Generation [68.06991943974195]
We present RenderDiffusion, the first diffusion model for 3D generation and inference, trained using only monocular 2D supervision.
We evaluate RenderDiffusion on FFHQ, AFHQ, ShapeNet and CLEVR datasets, showing competitive performance for generation of 3D scenes and inference of 3D scenes from 2D images.
arXiv Detail & Related papers (2022-11-17T20:17:04Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - 3D-aware Image Synthesis via Learning Structural and Textural
Representations [39.681030539374994]
We propose VolumeGAN, for high-fidelity 3D-aware image synthesis, through explicitly learning a structural representation and a textural representation.
Our approach achieves sufficiently higher image quality and better 3D control than the previous methods.
arXiv Detail & Related papers (2021-12-20T18:59:40Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.