D$^2$NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from
a Monocular Video
- URL: http://arxiv.org/abs/2205.15838v2
- Date: Wed, 1 Jun 2022 09:40:21 GMT
- Title: D$^2$NeRF: Self-Supervised Decoupling of Dynamic and Static Objects from
a Monocular Video
- Authors: Tianhao Wu, Fangcheng Zhong, Andrea Tagliasacchi, Forrester Cole,
Cengiz Oztireli
- Abstract summary: Given a monocular video, segmenting and decoupling dynamic objects while recovering the static environment is a widely studied problem in machine intelligence.
We introduce Decoupled Dynamic Neural Radiance Field (D$2$NeRF), a self-supervised approach that takes a monocular video and learns a 3D scene representation.
- Score: 23.905013304668426
- License: http://creativecommons.org/licenses/by-sa/4.0/
- Abstract: Given a monocular video, segmenting and decoupling dynamic objects while
recovering the static environment is a widely studied problem in machine
intelligence. Existing solutions usually approach this problem in the image
domain, limiting their performance and understanding of the environment. We
introduce Decoupled Dynamic Neural Radiance Field (D$^2$NeRF), a
self-supervised approach that takes a monocular video and learns a 3D scene
representation which decouples moving objects, including their shadows, from
the static background. Our method represents the moving objects and the static
background by two separate neural radiance fields with only one allowing for
temporal changes. A naive implementation of this approach leads to the dynamic
component taking over the static one as the representation of the former is
inherently more general and prone to overfitting. To this end, we propose a
novel loss to promote correct separation of phenomena. We further propose a
shadow field network to detect and decouple dynamically moving shadows. We
introduce a new dataset containing various dynamic objects and shadows and
demonstrate that our method can achieve better performance than
state-of-the-art approaches in decoupling dynamic and static 3D objects,
occlusion and shadow removal, and image segmentation for moving objects.
Related papers
- V3D-SLAM: Robust RGB-D SLAM in Dynamic Environments with 3D Semantic Geometry Voting [1.3493547928462395]
Simultaneous localization and mapping (SLAM) in highly dynamic environments is challenging due to the correlation between moving objects and the camera pose.
We propose a robust method, V3D-SLAM, to remove moving objects via two lightweight re-evaluation stages.
Our experiment on the TUM RGB-D benchmark on dynamic sequences with ground-truth camera trajectories showed that our methods outperform the most recent state-of-the-art SLAM methods.
arXiv Detail & Related papers (2024-10-15T21:08:08Z) - Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering [57.895846642868904]
We present a 3D generative model named DynaVol-S for dynamic scenes that enables object-centric learning.
voxelization infers per-object occupancy probabilities at individual spatial locations.
Our approach integrates 2D semantic features to create 3D semantic grids, representing the scene through multiple disentangled voxel grids.
arXiv Detail & Related papers (2024-07-30T15:33:58Z) - Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion.
We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases.
Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - BundleSDF: Neural 6-DoF Tracking and 3D Reconstruction of Unknown
Objects [89.2314092102403]
We present a near real-time method for 6-DoF tracking of an unknown object from a monocular RGBD video sequence.
Our method works for arbitrary rigid objects, even when visual texture is largely absent.
arXiv Detail & Related papers (2023-03-24T17:13:49Z) - Discovering Objects that Can Move [55.743225595012966]
We study the problem of object discovery -- separating objects from the background without manual labels.
Existing approaches utilize appearance cues, such as color, texture, and location, to group pixels into object-like regions.
We choose to focus on dynamic objects -- entities that can move independently in the world.
arXiv Detail & Related papers (2022-03-18T21:13:56Z) - NeuralDiff: Segmenting 3D objects that move in egocentric videos [92.95176458079047]
We study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground.
This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion.
In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them.
arXiv Detail & Related papers (2021-10-19T12:51:35Z) - STaR: Self-supervised Tracking and Reconstruction of Rigid Objects in
Motion with Neural Rendering [9.600908665766465]
We present STaR, a novel method that performs Self-supervised Tracking and Reconstruction of dynamic scenes with rigid motion from multi-view RGB videos without any manual annotation.
We show that our method can render photorealistic novel views, where novelty is measured on both spatial and temporal axes.
arXiv Detail & Related papers (2020-12-22T23:45:28Z) - Empty Cities: a Dynamic-Object-Invariant Space for Visual SLAM [6.693607456009373]
We present a data-driven approach to obtain the static image of a scene, eliminating dynamic objects that might have been present at the time of traversing the scene with a camera.
We introduce an end-to-end deep learning framework to turn images of an urban environment into realistic static frames suitable for localization and mapping.
arXiv Detail & Related papers (2020-10-15T10:31:12Z) - Removing Dynamic Objects for Static Scene Reconstruction using Light
Fields [2.286041284499166]
Dynamic environments pose challenges to visual simultaneous localization and mapping (SLAM) algorithms.
Light Fields capture a bundle of light rays emerging from a single point in space, allowing us to see through dynamic objects by refocusing past them.
We present a method to synthesize a refocused image of the static background in the presence of dynamic objects.
arXiv Detail & Related papers (2020-03-24T19:05:17Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.