NeuralDiff: Segmenting 3D objects that move in egocentric videos
- URL: http://arxiv.org/abs/2110.09936v1
- Date: Tue, 19 Oct 2021 12:51:35 GMT
- Title: NeuralDiff: Segmenting 3D objects that move in egocentric videos
- Authors: Vadim Tschernezki, Diane Larlus, Andrea Vedaldi
- Abstract summary: We study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground.
This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion.
In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them.
- Score: 92.95176458079047
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Given a raw video sequence taken from a freely-moving camera, we study the
problem of decomposing the observed 3D scene into a static background and a
dynamic foreground containing the objects that move in the video sequence. This
task is reminiscent of the classic background subtraction problem, but is
significantly harder because all parts of the scene, static and dynamic,
generate a large apparent motion due to the camera large viewpoint change. In
particular, we consider egocentric videos and further separate the dynamic
component into objects and the actor that observes and moves them. We achieve
this factorization by reconstructing the video via a triple-stream neural
rendering network that explains the different motions based on corresponding
inductive biases. We demonstrate that our method can successfully separate the
different types of motion, outperforming recent neural rendering baselines at
this task, and can accurately segment moving objects. We do so by assessing the
method empirically on challenging videos from the EPIC-KITCHENS dataset which
we augment with appropriate annotations to create a new benchmark for the task
of dynamic object segmentation on unconstrained video sequences, for complex 3D
environments.
Related papers
- Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion.
We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases.
Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - EgoGaussian: Dynamic Scene Understanding from Egocentric Video with 3D Gaussian Splatting [95.44545809256473]
EgoGaussian is a method capable of simultaneously reconstructing 3D scenes and dynamically tracking 3D object motion from RGB egocentric input alone.
We show significant improvements in terms of both dynamic object and background reconstruction quality compared to the state-of-the-art.
arXiv Detail & Related papers (2024-06-28T10:39:36Z) - DreamScene4D: Dynamic Multi-Object Scene Generation from Monocular Videos [21.93514516437402]
We present DreamScene4D, the first approach to generate 3D dynamic scenes of multiple objects from monocular videos via novel view synthesis.
Our key insight is a "decompose-recompose" approach that factorizes the video scene into the background and object tracks.
We show extensive results on challenging DAVIS, Kubric, and self-captured videos with quantitative comparisons and a user preference study.
arXiv Detail & Related papers (2024-05-03T17:55:34Z) - InstMove: Instance Motion for Object-centric Video Segmentation [70.16915119724757]
In this work, we study the instance-level motion and present InstMove, which stands for Instance Motion for Object-centric Video.
In comparison to pixel-wise motion, InstMove mainly relies on instance-level motion information that is free from image feature embeddings.
With only a few lines of code, InstMove can be integrated into current SOTA methods for three different video segmentation tasks.
arXiv Detail & Related papers (2023-03-14T17:58:44Z) - DynIBaR: Neural Dynamic Image-Based Rendering [79.44655794967741]
We address the problem of synthesizing novel views from a monocular video depicting a complex dynamic scene.
We adopt a volumetric image-based rendering framework that synthesizes new viewpoints by aggregating features from nearby views.
We demonstrate significant improvements over state-of-the-art methods on dynamic scene datasets.
arXiv Detail & Related papers (2022-11-20T20:57:02Z) - Consistent Depth of Moving Objects in Video [52.72092264848864]
We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera.
We formulate this objective in a new test-time training framework where a depth-prediction CNN is trained in tandem with an auxiliary scene-flow prediction over the entire input video.
We demonstrate accurate and temporally coherent results on a variety of challenging videos containing diverse moving objects (pets, people, cars) as well as camera motion.
arXiv Detail & Related papers (2021-08-02T20:53:18Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.