MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without
Camera Pose
- URL: http://arxiv.org/abs/2210.07181v2
- Date: Sun, 4 Jun 2023 07:17:39 GMT
- Title: MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without
Camera Pose
- Authors: Yang Fu, Ishan Misra, Xiaolong Wang
- Abstract summary: We propose a generalizable neural radiance fields - MonoNeRF, that can be trained on large-scale monocular videos of moving in static scenes.
MonoNeRF follows an Autoencoder-based architecture, where the encoder estimates the monocular depth and the camera pose.
It can be applied to multiple applications including depth estimation, camera pose estimation, and single-image novel view synthesis.
- Score: 29.601253968190306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: We propose a generalizable neural radiance fields - MonoNeRF, that can be
trained on large-scale monocular videos of moving in static scenes without any
ground-truth annotations of depth and camera poses. MonoNeRF follows an
Autoencoder-based architecture, where the encoder estimates the monocular depth
and the camera pose, and the decoder constructs a Multiplane NeRF
representation based on the depth encoder feature, and renders the input frames
with the estimated camera. The learning is supervised by the reconstruction
error. Once the model is learned, it can be applied to multiple applications
including depth estimation, camera pose estimation, and single-image novel view
synthesis. More qualitative results are available at:
https://oasisyang.github.io/mononerf .
Related papers
- DistillNeRF: Perceiving 3D Scenes from Single-Glance Images by Distilling Neural Fields and Foundation Model Features [65.8738034806085]
DistillNeRF is a self-supervised learning framework for understanding 3D environments in autonomous driving scenes.
Our method is a generalizable feedforward model that predicts a rich neural scene representation from sparse, single-frame multi-view camera inputs.
arXiv Detail & Related papers (2024-06-17T21:15:13Z) - MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular
Videos [23.09306118872098]
We propose MonoNeRF to simultaneously learn point features and scene flows with point trajectory and feature correspondence constraints across frames.
Experiments show that our MonoNeRF is able to learn from multiple scenes and support new applications such as scene editing, unseen frame synthesis, and fast novel scene adaptation.
arXiv Detail & Related papers (2022-12-26T09:20:55Z) - ViewNeRF: Unsupervised Viewpoint Estimation Using Category-Level Neural
Radiance Fields [35.89557494372891]
We introduce ViewNeRF, a Neural Radiance Field-based viewpoint estimation method.
Our method uses an analysis by synthesis approach, combining a conditional NeRF with a viewpoint predictor and a scene encoder.
Our model shows competitive results on synthetic and real datasets.
arXiv Detail & Related papers (2022-12-01T11:16:11Z) - SPARF: Neural Radiance Fields from Sparse and Noisy Poses [58.528358231885846]
We introduce Sparse Pose Adjusting Radiance Field (SPARF) to address the challenge of novel-view synthesis.
Our approach exploits multi-view geometry constraints in order to jointly learn the NeRF and refine the camera poses.
arXiv Detail & Related papers (2022-11-21T18:57:47Z) - Robustifying the Multi-Scale Representation of Neural Radiance Fields [86.69338893753886]
We present a robust multi-scale neural radiance fields representation approach to overcome both real-world imaging issues.
Our method handles multi-scale imaging effects and camera-pose estimation problems with NeRF-inspired approaches.
We demonstrate, with examples, that for an accurate neural representation of an object from day-to-day acquired multi-view images, it is crucial to have precise camera-pose estimates.
arXiv Detail & Related papers (2022-10-09T11:46:45Z) - BARF: Bundle-Adjusting Neural Radiance Fields [104.97810696435766]
We propose Bundle-Adjusting Neural Radiance Fields (BARF) for training NeRF from imperfect camera poses.
BARF can effectively optimize the neural scene representations and resolve large camera pose misalignment at the same time.
This enables view synthesis and localization of video sequences from unknown camera poses, opening up new avenues for visual localization systems.
arXiv Detail & Related papers (2021-04-13T17:59:51Z) - iNeRF: Inverting Neural Radiance Fields for Pose Estimation [68.91325516370013]
We present iNeRF, a framework that performs mesh-free pose estimation by "inverting" a Neural RadianceField (NeRF)
NeRFs have been shown to be remarkably effective for the task of view synthesis.
arXiv Detail & Related papers (2020-12-10T18:36:40Z) - Neural Ray Surfaces for Self-Supervised Learning of Depth and Ego-motion [51.19260542887099]
We show that self-supervision can be used to learn accurate depth and ego-motion estimation without prior knowledge of the camera model.
Inspired by the geometric model of Grossberg and Nayar, we introduce Neural Ray Surfaces (NRS), convolutional networks that represent pixel-wise projection rays.
We demonstrate the use of NRS for self-supervised learning of visual odometry and depth estimation from raw videos obtained using a wide variety of camera systems.
arXiv Detail & Related papers (2020-08-15T02:29:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.