ProvNeRF: Modeling per Point Provenance in NeRFs as a Stochastic Process
- URL: http://arxiv.org/abs/2401.08140v2
- Date: Thu, 18 Jan 2024 07:01:15 GMT
- Title: ProvNeRF: Modeling per Point Provenance in NeRFs as a Stochastic Process
- Authors: Kiyohiro Nakayama, Mikaela Angelina Uy, Yang You, Ke Li, Leonidas
Guibas
- Abstract summary: Neural radiance fields (NeRFs) have gained popularity across various applications.
They face challenges in the sparse view setting, lacking sufficient constraints from volume rendering.
We introduce ProvNeRF, a model that enriches a traditional NeRF representation by incorporating per-point provenance.
- Score: 12.534255228953741
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Neural radiance fields (NeRFs) have gained popularity across various
applications. However, they face challenges in the sparse view setting, lacking
sufficient constraints from volume rendering. Reconstructing and understanding
a 3D scene from sparse and unconstrained cameras is a long-standing problem in
classical computer vision with diverse applications. While recent works have
explored NeRFs in sparse, unconstrained view scenarios, their focus has been
primarily on enhancing reconstruction and novel view synthesis. Our approach
takes a broader perspective by posing the question: "from where has each point
been seen?" -- which gates how well we can understand and reconstruct it. In
other words, we aim to determine the origin or provenance of each 3D point and
its associated information under sparse, unconstrained views. We introduce
ProvNeRF, a model that enriches a traditional NeRF representation by
incorporating per-point provenance, modeling likely source locations for each
point. We achieve this by extending implicit maximum likelihood estimation
(IMLE) for stochastic processes. Notably, our method is compatible with any
pre-trained NeRF model and the associated training camera poses. We demonstrate
that modeling per-point provenance offers several advantages, including
uncertainty estimation, criteria-based view selection, and improved novel view
synthesis, compared to state-of-the-art methods. Please visit our project page
at https://provnerf.github.io
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving.
It predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
It is trained self-supervised with differentiable rendering to reconstruct RGB, depth, or feature images.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and
Reconstruction [77.69363640021503]
3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images.
We present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects.
arXiv Detail & Related papers (2023-04-13T17:59:01Z) - SCADE: NeRFs from Space Carving with Ambiguity-Aware Depth Estimates [16.344734292989504]
SCADE is a novel technique that improves NeRF reconstruction quality on sparse, unconstrained input views.
We propose a new method that learns to predict, for each view, a continuous, multimodal distribution of depth estimates.
Experiments show that our approach enables higher fidelity novel view synthesis from sparse views.
arXiv Detail & Related papers (2023-03-23T18:00:07Z) - NerfDiff: Single-image View Synthesis with NeRF-guided Distillation from
3D-aware Diffusion [107.67277084886929]
Novel view synthesis from a single image requires inferring occluded regions of objects and scenes whilst simultaneously maintaining semantic and physical consistency with the input.
We propose NerfDiff, which addresses this issue by distilling the knowledge of a 3D-aware conditional diffusion model (CDM) into NeRF through synthesizing and refining a set of virtual views at test time.
We further propose a novel NeRF-guided distillation algorithm that simultaneously generates 3D consistent virtual views from the CDM samples, and finetunes the NeRF based on the improved virtual views.
arXiv Detail & Related papers (2023-02-20T17:12:00Z) - Shape, Pose, and Appearance from a Single Image via Bootstrapped
Radiance Field Inversion [54.151979979158085]
We introduce a principled end-to-end reconstruction framework for natural images, where accurate ground-truth poses are not available.
We leverage an unconditional 3D-aware generator, to which we apply a hybrid inversion scheme where a model produces a first guess of the solution.
Our framework can de-render an image in as few as 10 steps, enabling its use in practical scenarios.
arXiv Detail & Related papers (2022-11-21T17:42:42Z) - ActiveNeRF: Learning where to See with Uncertainty Estimation [36.209200774203005]
Recently, Neural Radiance Fields (NeRF) has shown promising performances on reconstructing 3D scenes and synthesizing novel views from a sparse set of 2D images.
We present a novel learning framework, ActiveNeRF, aiming to model a 3D scene with a constrained input budget.
arXiv Detail & Related papers (2022-09-18T12:09:15Z) - CLONeR: Camera-Lidar Fusion for Occupancy Grid-aided Neural
Representations [77.90883737693325]
This paper proposes CLONeR, which significantly improves upon NeRF by allowing it to model large outdoor driving scenes observed from sparse input sensor views.
This is achieved by decoupling occupancy and color learning within the NeRF framework into separate Multi-Layer Perceptrons (MLPs) trained using LiDAR and camera data, respectively.
In addition, this paper proposes a novel method to build differentiable 3D Occupancy Grid Maps (OGM) alongside the NeRF model, and leverage this occupancy grid for improved sampling of points along a ray for rendering in metric space.
arXiv Detail & Related papers (2022-09-02T17:44:50Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z) - ViewFormer: NeRF-free Neural Rendering from Few Images Using
Transformers [34.4824364161812]
Novel view synthesis is a problem where we are given only a few context views sparsely covering a scene or an object.
The goal is to predict novel viewpoints in the scene, which requires learning priors.
We propose a 2D-only method that maps multiple context views and a query pose to a new image in a single pass of a neural network.
arXiv Detail & Related papers (2022-03-18T21:08:23Z) - A-NeRF: Surface-free Human 3D Pose Refinement via Neural Rendering [13.219688351773422]
We propose a test-time optimization approach for monocular motion capture that learns a volumetric body model of the user in a self-supervised manner.
Our approach is self-supervised and does not require any additional ground truth labels for appearance, pose, or 3D shape.
We demonstrate that our novel combination of a discriminative pose estimation technique with surface-free analysis-by-synthesis outperforms purely discriminative monocular pose estimation approaches.
arXiv Detail & Related papers (2021-02-11T18:58:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.