MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular
Videos
- URL: http://arxiv.org/abs/2212.13056v3
- Date: Mon, 14 Aug 2023 17:20:43 GMT
- Title: MonoNeRF: Learning a Generalizable Dynamic Radiance Field from Monocular
Videos
- Authors: Fengrui Tian, Shaoyi Du, Yueqi Duan
- Abstract summary: We propose MonoNeRF to simultaneously learn point features and scene flows with point trajectory and feature correspondence constraints across frames.
Experiments show that our MonoNeRF is able to learn from multiple scenes and support new applications such as scene editing, unseen frame synthesis, and fast novel scene adaptation.
- Score: 23.09306118872098
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: In this paper, we target at the problem of learning a generalizable dynamic
radiance field from monocular videos. Different from most existing NeRF methods
that are based on multiple views, monocular videos only contain one view at
each timestamp, thereby suffering from ambiguity along the view direction in
estimating point features and scene flows. Previous studies such as DynNeRF
disambiguate point features by positional encoding, which is not transferable
and severely limits the generalization ability. As a result, these methods have
to train one independent model for each scene and suffer from heavy
computational costs when applying to increasing monocular videos in real-world
applications. To address this, We propose MonoNeRF to simultaneously learn
point features and scene flows with point trajectory and feature correspondence
constraints across frames. More specifically, we learn an implicit velocity
field to estimate point trajectory from temporal features with Neural ODE,
which is followed by a flow-based feature aggregation module to obtain spatial
features along the point trajectory. We jointly optimize temporal and spatial
features in an end-to-end manner. Experiments show that our MonoNeRF is able to
learn from multiple scenes and support new applications such as scene editing,
unseen frame synthesis, and fast novel scene adaptation. Codes are available at
https://github.com/tianfr/MonoNeRF.
Related papers
- D-NPC: Dynamic Neural Point Clouds for Non-Rigid View Synthesis from Monocular Video [53.83936023443193]
This paper contributes to the field by introducing a new synthesis method for dynamic novel view from monocular video, such as smartphone captures.
Our approach represents the as a $textitdynamic neural point cloud$, an implicit time-conditioned point cloud that encodes local geometry and appearance in separate hash-encoded neural feature grids.
arXiv Detail & Related papers (2024-06-14T14:35:44Z) - CTNeRF: Cross-Time Transformer for Dynamic Neural Radiance Field from Monocular Video [25.551944406980297]
We propose a novel approach to generate high-quality novel views from monocular videos of complex and dynamic scenes.
We introduce a module that operates in both the time and frequency domains to aggregate the features of object motion.
Our experiments demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2024-01-10T00:40:05Z) - Point-DynRF: Point-based Dynamic Radiance Fields from a Monocular Video [19.0733297053322]
We introduce point-based dynamic radiance fields, where the global geometric information and volume rendering process are trained by neural point clouds and dynamic radiance fields, respectively.
Specifically, we reconstruct neural point clouds directly from geometric proxies and optimize both radiance fields and the geometric proxies using our proposed losses.
We validate the effectiveness of our method with experiments on the NVIDIA Dynamic Scenes dataset and several causally captured monocular video clips.
arXiv Detail & Related papers (2023-10-14T19:27:46Z) - GHuNeRF: Generalizable Human NeRF from a Monocular Video [63.741714198481354]
GHuNeRF learns a generalizable human NeRF model from a monocular video.
We validate our approach on the widely-used ZJU-MoCap dataset.
arXiv Detail & Related papers (2023-08-31T09:19:06Z) - MonoNeRF: Learning Generalizable NeRFs from Monocular Videos without
Camera Pose [29.601253968190306]
We propose a generalizable neural radiance fields - MonoNeRF, that can be trained on large-scale monocular videos of moving in static scenes.
MonoNeRF follows an Autoencoder-based architecture, where the encoder estimates the monocular depth and the camera pose.
It can be applied to multiple applications including depth estimation, camera pose estimation, and single-image novel view synthesis.
arXiv Detail & Related papers (2022-10-13T17:03:22Z) - Learning Multi-Object Dynamics with Compositional Neural Radiance Fields [63.424469458529906]
We present a method to learn compositional predictive models from image observations based on implicit object encoders, Neural Radiance Fields (NeRFs), and graph neural networks.
NeRFs have become a popular choice for representing scenes due to their strong 3D prior.
For planning, we utilize RRTs in the learned latent space, where we can exploit our model and the implicit object encoder to make sampling the latent space informative and more efficient.
arXiv Detail & Related papers (2022-02-24T01:31:29Z) - RegNeRF: Regularizing Neural Radiance Fields for View Synthesis from
Sparse Inputs [79.00855490550367]
We show that NeRF can produce photorealistic renderings of unseen viewpoints when many input views are available.
We address this by regularizing the geometry and appearance of patches rendered from unobserved viewpoints.
Our model outperforms not only other methods that optimize over a single scene, but also conditional models that are extensively pre-trained on large multi-view datasets.
arXiv Detail & Related papers (2021-12-01T18:59:46Z) - Dynamic View Synthesis from Dynamic Monocular Video [69.80425724448344]
We present an algorithm for generating views at arbitrary viewpoints and any input time step given a monocular video of a dynamic scene.
We show extensive quantitative and qualitative results of dynamic view synthesis from casually captured videos.
arXiv Detail & Related papers (2021-05-13T17:59:50Z) - pixelNeRF: Neural Radiance Fields from One or Few Images [20.607712035278315]
pixelNeRF is a learning framework that predicts a continuous neural scene representation conditioned on one or few input images.
We conduct experiments on ShapeNet benchmarks for single image novel view synthesis tasks with held-out objects.
In all cases, pixelNeRF outperforms current state-of-the-art baselines for novel view synthesis and single image 3D reconstruction.
arXiv Detail & Related papers (2020-12-03T18:59:54Z) - Neural Sparse Voxel Fields [151.20366604586403]
We introduce Neural Sparse Voxel Fields (NSVF), a new neural scene representation for fast and high-quality free-viewpoint rendering.
NSVF defines a set of voxel-bounded implicit fields organized in a sparse voxel octree to model local properties in each cell.
Our method is typically over 10 times faster than the state-of-the-art (namely, NeRF(Mildenhall et al., 2020)) at inference time while achieving higher quality results.
arXiv Detail & Related papers (2020-07-22T17:51:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.