Entangled View-Epipolar Information Aggregation for Generalizable Neural
Radiance Fields
- URL: http://arxiv.org/abs/2311.11845v2
- Date: Tue, 12 Mar 2024 10:57:53 GMT
- Title: Entangled View-Epipolar Information Aggregation for Generalizable Neural
Radiance Fields
- Authors: Zhiyuan Min, Yawei Luo, Wei Yang, Yuesong Wang, Yi Yang
- Abstract summary: Generalizable NeRF can synthesize novel views across new scenes, eliminating the need for scene-specific retraining in vanilla NeRF.
We propose an Entangled View-Epipolar Information aggregation method dubbed EVE-NeRF.
- Score: 28.549053233615382
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Generalizable NeRF can directly synthesize novel views across new scenes,
eliminating the need for scene-specific retraining in vanilla NeRF. A critical
enabling factor in these approaches is the extraction of a generalizable 3D
representation by aggregating source-view features. In this paper, we propose
an Entangled View-Epipolar Information Aggregation method dubbed EVE-NeRF.
Different from existing methods that consider cross-view and along-epipolar
information independently, EVE-NeRF conducts the view-epipolar feature
aggregation in an entangled manner by injecting the scene-invariant appearance
continuity and geometry consistency priors to the aggregation process. Our
approach effectively mitigates the potential lack of inherent geometric and
appearance constraint resulting from one-dimensional interactions, thus further
boosting the 3D representation generalizablity. EVE-NeRF attains
state-of-the-art performance across various evaluation scenarios. Extensive
experiments demonstate that, compared to prevailing single-dimensional
aggregation, the entangled network excels in the accuracy of 3D scene geometry
and appearance reconstruction. Our code is publicly available at
https://github.com/tatakai1/EVENeRF.
Related papers
- DiHuR: Diffusion-Guided Generalizable Human Reconstruction [51.31232435994026]
We introduce DiHuR, a Diffusion-guided model for generalizable Human 3D Reconstruction and view synthesis from sparse, minimally overlapping images.
Our method integrates two key priors in a coherent manner: the prior from generalizable feed-forward models and the 2D diffusion prior, and it requires only multi-view image training, without 3D supervision.
arXiv Detail & Related papers (2024-11-16T03:52:23Z) - Epipolar-Free 3D Gaussian Splatting for Generalizable Novel View Synthesis [25.924727931514735]
Generalizable 3DGS can reconstruct new scenes from sparse-view observations in a feed-forward inference manner.
Existing methods rely heavily on epipolar priors, which can be unreliable in complex realworld scenes.
We propose eFreeSplat, an efficient feed-forward 3DGS-based model for generalizable novel view synthesis.
arXiv Detail & Related papers (2024-10-30T08:51:29Z) - GOV-NeSF: Generalizable Open-Vocabulary Neural Semantic Fields [50.68719394443926]
Generalizable Open-Vocabulary Neural Semantic Fields (GOV-NeSF) is a novel approach offering a generalizable implicit representation of 3D scenes with open-vocabulary semantics.
GOV-NeSF exhibits state-of-the-art performance in both 2D and 3D open-vocabulary semantic segmentation.
arXiv Detail & Related papers (2024-04-01T05:19:50Z) - GeoWizard: Unleashing the Diffusion Priors for 3D Geometry Estimation from a Single Image [94.56927147492738]
We introduce GeoWizard, a new generative foundation model designed for estimating geometric attributes from single images.
We show that leveraging diffusion priors can markedly improve generalization, detail preservation, and efficiency in resource usage.
We propose a simple yet effective strategy to segregate the complex data distribution of various scenes into distinct sub-distributions.
arXiv Detail & Related papers (2024-03-18T17:50:41Z) - Learning Robust Generalizable Radiance Field with Visibility and Feature
Augmented Point Representation [7.203073346844801]
This paper introduces a novel paradigm for the generalizable neural radiance field (NeRF)
We propose the first paradigm that constructs the generalizable neural field based on point-based rather than image-based rendering.
Our approach explicitly models visibilities by geometric priors and augments them with neural features.
arXiv Detail & Related papers (2024-01-25T17:58:51Z) - MuRF: Multi-Baseline Radiance Fields [117.55811938988256]
We present Multi-Baseline Radiance Fields (MuRF), a feed-forward approach to solving sparse view synthesis.
MuRF achieves state-of-the-art performance across multiple different baseline settings.
We also show promising zero-shot generalization abilities on the Mip-NeRF 360 dataset.
arXiv Detail & Related papers (2023-12-07T18:59:56Z) - Single-Stage Diffusion NeRF: A Unified Approach to 3D Generation and
Reconstruction [77.69363640021503]
3D-aware image synthesis encompasses a variety of tasks, such as scene generation and novel view synthesis from images.
We present SSDNeRF, a unified approach that employs an expressive diffusion model to learn a generalizable prior of neural radiance fields (NeRF) from multi-view images of diverse objects.
arXiv Detail & Related papers (2023-04-13T17:59:01Z) - GARF:Geometry-Aware Generalized Neural Radiance Field [47.76524984421343]
We propose Geometry-Aware Generalized Neural Radiance Field (GARF) with a geometry-aware dynamic sampling (GADS) strategy.
Our framework infers the unseen scenes on both pixel-scale and geometry-scale with only a few input images.
Experiments on indoor and outdoor datasets show that GARF reduces samples by more than 25%, while improving rendering quality and 3D geometry estimation.
arXiv Detail & Related papers (2022-12-05T14:00:59Z) - Neural Capture of Animatable 3D Human from Monocular Video [38.974181971541846]
We present a novel paradigm of building an animatable 3D human representation from a monocular video input, such that it can be rendered in any unseen poses and views.
Our method is based on a dynamic Neural Radiance Field (NeRF) rigged by a mesh-based parametric 3D human model serving as a geometry proxy.
arXiv Detail & Related papers (2022-08-18T09:20:48Z) - Vision Transformer for NeRF-Based View Synthesis from a Single Input
Image [49.956005709863355]
We propose to leverage both the global and local features to form an expressive 3D representation.
To synthesize a novel view, we train a multilayer perceptron (MLP) network conditioned on the learned 3D representation to perform volume rendering.
Our method can render novel views from only a single input image and generalize across multiple object categories using a single model.
arXiv Detail & Related papers (2022-07-12T17:52:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.