DeLiRa: Self-Supervised Depth, Light, and Radiance Fields
- URL: http://arxiv.org/abs/2304.02797v1
- Date: Thu, 6 Apr 2023 00:16:25 GMT
- Title: DeLiRa: Self-Supervised Depth, Light, and Radiance Fields
- Authors: Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Sergey
Zakharov, Vincent Sitzmann, Adrien Gaidon
- Abstract summary: Differentiable volumetric rendering is a powerful paradigm for 3D reconstruction and novel view synthesis.
Standard volume rendering approaches struggle with degenerate geometries in the case of limited viewpoint diversity.
In this work, we propose to use the multi-view photometric objective as a geometric regularizer for volumetric rendering.
- Score: 32.350984950639656
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Differentiable volumetric rendering is a powerful paradigm for 3D
reconstruction and novel view synthesis. However, standard volume rendering
approaches struggle with degenerate geometries in the case of limited viewpoint
diversity, a common scenario in robotics applications. In this work, we propose
to use the multi-view photometric objective from the self-supervised depth
estimation literature as a geometric regularizer for volumetric rendering,
significantly improving novel view synthesis without requiring additional
information. Building upon this insight, we explore the explicit modeling of
scene geometry using a generalist Transformer, jointly learning a radiance
field as well as depth and light fields with a set of shared latent codes. We
demonstrate that sharing geometric information across tasks is mutually
beneficial, leading to improvements over single-task learning without an
increase in network complexity. Our DeLiRa architecture achieves
state-of-the-art results on the ScanNet benchmark, enabling high quality
volumetric rendering as well as real-time novel view and depth synthesis in the
limited viewpoint diversity setting.
Related papers
- Incorporating dense metric depth into neural 3D representations for view synthesis and relighting [25.028859317188395]
In robotic applications, dense metric depth can often be measured directly using stereo and illumination can be controlled.
In this work we demonstrate a method to incorporate dense metric depth into the training of neural 3D representations.
We also discuss a multi-flash stereo camera system developed to capture the necessary data for our pipeline and show results on relighting and view synthesis.
arXiv Detail & Related papers (2024-09-04T20:21:13Z) - GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View Synthesis [63.5925701087252]
We propose a novel method for representing a scene illuminated by a point light using a set of relightable 3D Gaussian points.
Inspired by the Blinn-Phong model, our approach decomposes the scene into ambient, diffuse, and specular components.
To facilitate the decomposition of geometric information independent of lighting conditions, we introduce a novel bilevel optimization-based meta-learning framework.
arXiv Detail & Related papers (2024-05-31T13:48:54Z) - Depth Reconstruction with Neural Signed Distance Fields in Structured Light Systems [15.603880588503355]
We introduce a novel depth estimation technique for multi-frame structured light setups using neural implicit representations of 3D space.
Our approach employs a neural signed distance field (SDF), trained through self-supervised differentiable rendering.
arXiv Detail & Related papers (2024-05-20T13:24:35Z) - Improved Neural Radiance Fields Using Pseudo-depth and Fusion [18.088617888326123]
We propose constructing multi-scale encoding volumes and providing multi-scale geometry information to NeRF models.
To make the constructed volumes as close as possible to the surfaces of objects in the scene and the rendered depth more accurate, we propose to perform depth prediction and radiance field reconstruction simultaneously.
arXiv Detail & Related papers (2023-07-27T17:01:01Z) - Learning to Render Novel Views from Wide-Baseline Stereo Pairs [26.528667940013598]
We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair.
Existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry.
We propose an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray.
arXiv Detail & Related papers (2023-04-17T17:40:52Z) - GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from
Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images.
Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z) - High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views.
Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z) - Depth Field Networks for Generalizable Multi-view Scene Representation [31.090289865520475]
We learn an implicit, multi-view consistent scene representation, introducing a series of 3D data augmentation techniques as a geometric inductive prior to increase view diversity.
Our Depth Field Networks (DeFiNe) achieve state-of-the-art results in stereo and video depth estimation without explicit geometric constraints, and improve on zero-shot domain generalization by a wide margin.
arXiv Detail & Related papers (2022-07-28T17:59:31Z) - Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis.
OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods.
It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z) - Extracting Triangular 3D Models, Materials, and Lighting From Images [59.33666140713829]
We present an efficient method for joint optimization of materials and lighting from multi-view image observations.
We leverage meshes with spatially-varying materials and environment that can be deployed in any traditional graphics engine.
arXiv Detail & Related papers (2021-11-24T13:58:20Z) - Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object.
We first estimate per-view depth maps using a deep multi-view stereo network.
These depth maps are used to coarsely align the different views.
We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.