DeLiRa: Self-Supervised Depth, Light, and Radiance Fields
- URL: http://arxiv.org/abs/2304.02797v1
- Date: Thu, 6 Apr 2023 00:16:25 GMT
- Title: DeLiRa: Self-Supervised Depth, Light, and Radiance Fields
- Authors: Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Sergey
Zakharov, Vincent Sitzmann, Adrien Gaidon
- Abstract summary: Differentiable volumetric rendering is a powerful paradigm for 3D reconstruction and novel view synthesis.
Standard volume rendering approaches struggle with degenerate geometries in the case of limited viewpoint diversity.
In this work, we propose to use the multi-view photometric objective as a geometric regularizer for volumetric rendering.
- Score: 32.350984950639656
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Differentiable volumetric rendering is a powerful paradigm for 3D
reconstruction and novel view synthesis. However, standard volume rendering
approaches struggle with degenerate geometries in the case of limited viewpoint
diversity, a common scenario in robotics applications. In this work, we propose
to use the multi-view photometric objective from the self-supervised depth
estimation literature as a geometric regularizer for volumetric rendering,
significantly improving novel view synthesis without requiring additional
information. Building upon this insight, we explore the explicit modeling of
scene geometry using a generalist Transformer, jointly learning a radiance
field as well as depth and light fields with a set of shared latent codes. We
demonstrate that sharing geometric information across tasks is mutually
beneficial, leading to improvements over single-task learning without an
increase in network complexity. Our DeLiRa architecture achieves
state-of-the-art results on the ScanNet benchmark, enabling high quality
volumetric rendering as well as real-time novel view and depth synthesis in the
limited viewpoint diversity setting.
Related papers
- IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics.
Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs.
We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z) - Incorporating dense metric depth into neural 3D representations for view synthesis and relighting [25.028859317188395]
In robotic applications, dense metric depth can often be measured directly using stereo and illumination can be controlled.
In this work we demonstrate a method to incorporate dense metric depth into the training of neural 3D representations.
We also discuss a multi-flash stereo camera system developed to capture the necessary data for our pipeline and show results on relighting and view synthesis.
arXiv Detail & Related papers (2024-09-04T20:21:13Z) - GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View Synthesis [63.5925701087252]
We propose a novel method for representing a scene illuminated by a point light using a set of relightable 3D Gaussian points.
Inspired by the Blinn-Phong model, our approach decomposes the scene into ambient, diffuse, and specular components.
To facilitate the decomposition of geometric information independent of lighting conditions, we introduce a novel bilevel optimization-based meta-learning framework.
arXiv Detail & Related papers (2024-05-31T13:48:54Z) - VoxNeRF: Bridging Voxel Representation and Neural Radiance Fields for Enhanced Indoor View Synthesis [73.50359502037232]
VoxNeRF is a novel approach to enhance the quality and efficiency of neural indoor reconstruction and novel view synthesis.
We propose an efficient voxel-guided sampling technique that allocates computational resources to selectively the most relevant segments of rays.
Our approach is validated with extensive experiments on ScanNet and ScanNet++.
arXiv Detail & Related papers (2023-11-09T11:32:49Z) - Improved Neural Radiance Fields Using Pseudo-depth and Fusion [18.088617888326123]
We propose constructing multi-scale encoding volumes and providing multi-scale geometry information to NeRF models.
To make the constructed volumes as close as possible to the surfaces of objects in the scene and the rendered depth more accurate, we propose to perform depth prediction and radiance field reconstruction simultaneously.
arXiv Detail & Related papers (2023-07-27T17:01:01Z) - Learning to Render Novel Views from Wide-Baseline Stereo Pairs [26.528667940013598]
We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair.
Existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry.
We propose an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray.
arXiv Detail & Related papers (2023-04-17T17:40:52Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Depth Field Networks for Generalizable Multi-view Scene Representation [31.090289865520475]
We learn an implicit, multi-view consistent scene representation, introducing a series of 3D data augmentation techniques as a geometric inductive prior to increase view diversity.
Our Depth Field Networks (DeFiNe) achieve state-of-the-art results in stereo and video depth estimation without explicit geometric constraints, and improve on zero-shot domain generalization by a wide margin.
arXiv Detail & Related papers (2022-07-28T17:59:31Z) - Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis.
OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods.
It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.