Related papers: DeLiRa: Self-Supervised Depth, Light, and Radiance Fields

DeLiRa: Self-Supervised Depth, Light, and Radiance Fields

URL: http://arxiv.org/abs/2304.02797v1
Date: Thu, 6 Apr 2023 00:16:25 GMT
Title: DeLiRa: Self-Supervised Depth, Light, and Radiance Fields
Authors: Vitor Guizilini, Igor Vasiljevic, Jiading Fang, Rares Ambrus, Sergey Zakharov, Vincent Sitzmann, Adrien Gaidon
Abstract summary: Differentiable volumetric rendering is a powerful paradigm for 3D reconstruction and novel view synthesis. Standard volume rendering approaches struggle with degenerate geometries in the case of limited viewpoint diversity. In this work, we propose to use the multi-view photometric objective as a geometric regularizer for volumetric rendering.
Score: 32.350984950639656
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Differentiable volumetric rendering is a powerful paradigm for 3D reconstruction and novel view synthesis. However, standard volume rendering approaches struggle with degenerate geometries in the case of limited viewpoint diversity, a common scenario in robotics applications. In this work, we propose to use the multi-view photometric objective from the self-supervised depth estimation literature as a geometric regularizer for volumetric rendering, significantly improving novel view synthesis without requiring additional information. Building upon this insight, we explore the explicit modeling of scene geometry using a generalist Transformer, jointly learning a radiance field as well as depth and light fields with a set of shared latent codes. We demonstrate that sharing geometric information across tasks is mutually beneficial, leading to improvements over single-task learning without an increase in network complexity. Our DeLiRa architecture achieves state-of-the-art results on the ScanNet benchmark, enabling high quality volumetric rendering as well as real-time novel view and depth synthesis in the limited viewpoint diversity setting.

Related papers

IDArb: Intrinsic Decomposition for Arbitrary Number of Input Views and Illuminations [64.07859467542664]
Capturing geometric and material information from images remains a fundamental challenge in computer vision and graphics. Traditional optimization-based methods often require hours of computational time to reconstruct geometry, material properties, and environmental lighting from dense multi-view inputs. We introduce IDArb, a diffusion-based model designed to perform intrinsic decomposition on an arbitrary number of images under varying illuminations.
arXiv Detail & Related papers (2024-12-16T18:52:56Z)
Incorporating dense metric depth into neural 3D representations for view synthesis and relighting [25.028859317188395]
In robotic applications, dense metric depth can often be measured directly using stereo and illumination can be controlled. In this work we demonstrate a method to incorporate dense metric depth into the training of neural 3D representations. We also discuss a multi-flash stereo camera system developed to capture the necessary data for our pipeline and show results on relighting and view synthesis.
arXiv Detail & Related papers (2024-09-04T20:21:13Z)
GS-Phong: Meta-Learned 3D Gaussians for Relightable Novel View Synthesis [63.5925701087252]
We propose a novel method for representing a scene illuminated by a point light using a set of relightable 3D Gaussian points. Inspired by the Blinn-Phong model, our approach decomposes the scene into ambient, diffuse, and specular components. To facilitate the decomposition of geometric information independent of lighting conditions, we introduce a novel bilevel optimization-based meta-learning framework.
arXiv Detail & Related papers (2024-05-31T13:48:54Z)
Depth Reconstruction with Neural Signed Distance Fields in Structured Light Systems [15.603880588503355]
We introduce a novel depth estimation technique for multi-frame structured light setups using neural implicit representations of 3D space. Our approach employs a neural signed distance field (SDF), trained through self-supervised differentiable rendering.
arXiv Detail & Related papers (2024-05-20T13:24:35Z)
VoxNeRF: Bridging Voxel Representation and Neural Radiance Fields for Enhanced Indoor View Synthesis [73.50359502037232]
VoxNeRF is a novel approach to enhance the quality and efficiency of neural indoor reconstruction and novel view synthesis. We propose an efficient voxel-guided sampling technique that allocates computational resources to selectively the most relevant segments of rays. Our approach is validated with extensive experiments on ScanNet and ScanNet++.
arXiv Detail & Related papers (2023-11-09T11:32:49Z)
Improved Neural Radiance Fields Using Pseudo-depth and Fusion [18.088617888326123]
We propose constructing multi-scale encoding volumes and providing multi-scale geometry information to NeRF models. To make the constructed volumes as close as possible to the surfaces of objects in the scene and the rendered depth more accurate, we propose to perform depth prediction and radiance field reconstruction simultaneously.
arXiv Detail & Related papers (2023-07-27T17:01:01Z)
Learning to Render Novel Views from Wide-Baseline Stereo Pairs [26.528667940013598]
We introduce a method for novel view synthesis given only a single wide-baseline stereo image pair. Existing approaches to novel view synthesis from sparse observations fail due to recovering incorrect 3D geometry. We propose an efficient, image-space epipolar line sampling scheme to assemble image features for a target ray.
arXiv Detail & Related papers (2023-04-17T17:40:52Z)
GM-NeRF: Learning Generalizable Model-based Neural Radiance Fields from Multi-view Images [79.39247661907397]
We introduce an effective framework Generalizable Model-based Neural Radiance Fields to synthesize free-viewpoint images. Specifically, we propose a geometry-guided attention mechanism to register the appearance code from multi-view 2D images to a geometry proxy.
arXiv Detail & Related papers (2023-03-24T03:32:02Z)
High-fidelity 3D GAN Inversion by Pseudo-multi-view Optimization [51.878078860524795]
We present a high-fidelity 3D generative adversarial network (GAN) inversion framework that can synthesize photo-realistic novel views. Our approach enables high-fidelity 3D rendering from a single image, which is promising for various applications of AI-generated 3D content.
arXiv Detail & Related papers (2022-11-28T18:59:52Z)
Depth Field Networks for Generalizable Multi-view Scene Representation [31.090289865520475]
We learn an implicit, multi-view consistent scene representation, introducing a series of 3D data augmentation techniques as a geometric inductive prior to increase view diversity. Our Depth Field Networks (DeFiNe) achieve state-of-the-art results in stereo and video depth estimation without explicit geometric constraints, and improve on zero-shot domain generalization by a wide margin.
arXiv Detail & Related papers (2022-07-28T17:59:31Z)
Object Scene Representation Transformer [56.40544849442227]
We introduce Object Scene Representation Transformer (OSRT), a 3D-centric model in which individual object representations naturally emerge through novel view synthesis. OSRT scales to significantly more complex scenes with larger diversity of objects and backgrounds than existing methods. It is multiple orders of magnitude faster at compositional rendering thanks to its light field parametrization and the novel Slot Mixer decoder.
arXiv Detail & Related papers (2022-06-14T15:40:47Z)
Extracting Triangular 3D Models, Materials, and Lighting From Images [59.33666140713829]
We present an efficient method for joint optimization of materials and lighting from multi-view image observations. We leverage meshes with spatially-varying materials and environment that can be deployed in any traditional graphics engine.
arXiv Detail & Related papers (2021-11-24T13:58:20Z)
Deep 3D Capture: Geometry and Reflectance from Sparse Multi-View Images [59.906948203578544]
We introduce a novel learning-based method to reconstruct the high-quality geometry and complex, spatially-varying BRDF of an arbitrary object. We first estimate per-view depth maps using a deep multi-view stereo network. These depth maps are used to coarsely align the different views. We propose a novel multi-view reflectance estimation network architecture.
arXiv Detail & Related papers (2020-03-27T21:28:54Z)

This list is automatically generated from the titles and abstracts of the papers in this site.