EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via
Self-Supervision
- URL: http://arxiv.org/abs/2311.02077v1
- Date: Fri, 3 Nov 2023 17:59:55 GMT
- Title: EmerNeRF: Emergent Spatial-Temporal Scene Decomposition via
Self-Supervision
- Authors: Jiawei Yang, Boris Ivanovic, Or Litany, Xinshuo Weng, Seung Wook Kim,
Boyi Li, Tong Che, Danfei Xu, Sanja Fidler, Marco Pavone, Yue Wang
- Abstract summary: EmerNeRF is a simple yet powerful approach for learning spatial-temporal representations of dynamic driving scenes.
It simultaneously captures scene geometry, appearance, motion, and semantics via self-bootstrapping.
Our method achieves state-of-the-art performance in sensor simulation.
- Score: 85.17951804790515
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present EmerNeRF, a simple yet powerful approach for learning
spatial-temporal representations of dynamic driving scenes. Grounded in neural
fields, EmerNeRF simultaneously captures scene geometry, appearance, motion,
and semantics via self-bootstrapping. EmerNeRF hinges upon two core components:
First, it stratifies scenes into static and dynamic fields. This decomposition
emerges purely from self-supervision, enabling our model to learn from general,
in-the-wild data sources. Second, EmerNeRF parameterizes an induced flow field
from the dynamic field and uses this flow field to further aggregate
multi-frame features, amplifying the rendering precision of dynamic objects.
Coupling these three fields (static, dynamic, and flow) enables EmerNeRF to
represent highly-dynamic scenes self-sufficiently, without relying on ground
truth object annotations or pre-trained models for dynamic object segmentation
or optical flow estimation. Our method achieves state-of-the-art performance in
sensor simulation, significantly outperforming previous methods when
reconstructing static (+2.93 PSNR) and dynamic (+3.70 PSNR) scenes. In
addition, to bolster EmerNeRF's semantic generalization, we lift 2D visual
foundation model features into 4D space-time and address a general positional
bias in modern Transformers, significantly boosting 3D perception performance
(e.g., 37.50% relative improvement in occupancy prediction accuracy on
average). Finally, we construct a diverse and challenging 120-sequence dataset
to benchmark neural fields under extreme and highly-dynamic settings.
Related papers
- SplatFlow: Self-Supervised Dynamic Gaussian Splatting in Neural Motion Flow Field for Autonomous Driving [11.564716761428251]
We introduce SplatFlow, a Self-Supervised Dynamic Gaussian Splatting within Motion Flow Fields (NMFF)
S SplatFlow learns 4D space-time representations without requiring tracked 3D bounding boxes, enabling accurate dynamic scene reconstruction and novel view RGB, depth and flow synthesis.
arXiv Detail & Related papers (2024-11-23T07:39:30Z) - MonST3R: A Simple Approach for Estimating Geometry in the Presence of Motion [118.74385965694694]
We present Motion DUSt3R (MonST3R), a novel geometry-first approach that directly estimates per-timestep geometry from dynamic scenes.
By simply estimating a pointmap for each timestep, we can effectively adapt DUST3R's representation, previously only used for static scenes, to dynamic scenes.
We show that by posing the problem as a fine-tuning task, identifying several suitable datasets, and strategically training the model on this limited data, we can surprisingly enable the model to handle dynamics.
arXiv Detail & Related papers (2024-10-04T18:00:07Z) - DENSER: 3D Gaussians Splatting for Scene Reconstruction of Dynamic Urban Environments [0.0]
We propose DENSER, a framework that significantly enhances the representation of dynamic objects.
The proposed approach significantly outperforms state-of-the-art methods by a wide margin.
arXiv Detail & Related papers (2024-09-16T07:11:58Z) - KFD-NeRF: Rethinking Dynamic NeRF with Kalman Filter [49.85369344101118]
We introduce KFD-NeRF, a novel dynamic neural radiance field integrated with an efficient and high-quality motion reconstruction framework based on Kalman filtering.
Our key idea is to model the dynamic radiance field as a dynamic system whose temporally varying states are estimated based on two sources of knowledge: observations and predictions.
Our KFD-NeRF demonstrates similar or even superior performance within comparable computational time and state-of-the-art view synthesis performance with thorough training.
arXiv Detail & Related papers (2024-07-18T05:48:24Z) - DynaMoN: Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields [71.94156412354054]
We propose Dynamic Motion-Aware Fast and Robust Camera Localization for Dynamic Neural Radiance Fields (DynaMoN)
DynaMoN handles dynamic content for initial camera pose estimation and statics-focused ray sampling for fast and accurate novel-view synthesis.
We extensively evaluate our approach on two real-world dynamic datasets, the TUM RGB-D dataset and the BONN RGB-D Dynamic dataset.
arXiv Detail & Related papers (2023-09-16T08:46:59Z) - OD-NeRF: Efficient Training of On-the-Fly Dynamic Neural Radiance Fields [63.04781030984006]
Dynamic neural radiance fields (dynamic NeRFs) have demonstrated impressive results in novel view synthesis on 3D dynamic scenes.
We propose OD-NeRF to efficiently train and render dynamic NeRFs on-the-fly which instead is capable of streaming the dynamic scene.
Our algorithm can achieve an interactive speed of 6FPS training and rendering on synthetic dynamic scenes on-the-fly, and a significant speed-up compared to the state-of-the-art on real-world dynamic scenes.
arXiv Detail & Related papers (2023-05-24T07:36:47Z) - DeVRF: Fast Deformable Voxel Radiance Fields for Dynamic Scenes [27.37830742693236]
We present DeVRF, a novel representation to accelerate learning dynamic radiance fields.
Experiments demonstrate that DeVRF achieves two orders of magnitude speedup with on-par high-fidelity results.
arXiv Detail & Related papers (2022-05-31T12:13:54Z) - Fast Dynamic Radiance Fields with Time-Aware Neural Voxels [106.69049089979433]
We propose a radiance field framework by representing scenes with time-aware voxel features, named as TiNeuVox.
Our framework accelerates the optimization of dynamic radiance fields while maintaining high rendering quality.
Our TiNeuVox completes training with only 8 minutes and 8-MB storage cost while showing similar or even better rendering performance than previous dynamic NeRF methods.
arXiv Detail & Related papers (2022-05-30T17:47:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.