Consistent Depth of Moving Objects in Video
- URL: http://arxiv.org/abs/2108.01166v1
- Date: Mon, 2 Aug 2021 20:53:18 GMT
- Title: Consistent Depth of Moving Objects in Video
- Authors: Zhoutong Zhang, Forrester Cole, Richard Tucker, William T. Freeman,
Tali Dekel
- Abstract summary: We present a method to estimate depth of a dynamic scene, containing arbitrary moving objects, from an ordinary video captured with a moving camera.
We formulate this objective in a new test-time training framework where a depth-prediction CNN is trained in tandem with an auxiliary scene-flow prediction over the entire input video.
We demonstrate accurate and temporally coherent results on a variety of challenging videos containing diverse moving objects (pets, people, cars) as well as camera motion.
- Score: 52.72092264848864
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: We present a method to estimate depth of a dynamic scene, containing
arbitrary moving objects, from an ordinary video captured with a moving camera.
We seek a geometrically and temporally consistent solution to this
underconstrained problem: the depth predictions of corresponding points across
frames should induce plausible, smooth motion in 3D. We formulate this
objective in a new test-time training framework where a depth-prediction CNN is
trained in tandem with an auxiliary scene-flow prediction MLP over the entire
input video. By recursively unrolling the scene-flow prediction MLP over
varying time steps, we compute both short-range scene flow to impose local
smooth motion priors directly in 3D, and long-range scene flow to impose
multi-view consistency constraints with wide baselines. We demonstrate accurate
and temporally coherent results on a variety of challenging videos containing
diverse moving objects (pets, people, cars), as well as camera motion. Our
depth maps give rise to a number of depth-and-motion aware video editing
effects such as object and lighting insertion.
Related papers
- DepthMOT: Depth Cues Lead to a Strong Multi-Object Tracker [4.65004369765875]
Accurately distinguishing each object is a fundamental goal of Multi-object tracking (MOT) algorithms.
In this paper, we propose textitDepthMOT, which achieves: (i) detecting and estimating scene depth map textitend-to-end, (ii) compensating the irregular camera motion by camera pose estimation.
arXiv Detail & Related papers (2024-04-08T13:39:12Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - FlowCam: Training Generalizable 3D Radiance Fields without Camera Poses
via Pixel-Aligned Scene Flow [26.528667940013598]
Reconstruction of 3D neural fields from posed images has emerged as a promising method for self-supervised representation learning.
Key challenge preventing the deployment of these 3D scene learners on large-scale video data is their dependence on precise camera poses from structure-from-motion.
We propose a method that jointly reconstructs camera poses and 3D neural scene representations online and in a single forward pass.
arXiv Detail & Related papers (2023-05-31T20:58:46Z) - Decoupling Dynamic Monocular Videos for Dynamic View Synthesis [50.93409250217699]
We tackle the challenge of dynamic view synthesis from dynamic monocular videos in an unsupervised fashion.
Specifically, we decouple the motion of the dynamic objects into object motion and camera motion, respectively regularized by proposed unsupervised surface consistency and patch-based multi-view constraints.
arXiv Detail & Related papers (2023-04-04T11:25:44Z) - Motion-from-Blur: 3D Shape and Motion Estimation of Motion-blurred
Objects in Videos [115.71874459429381]
We propose a method for jointly estimating the 3D motion, 3D shape, and appearance of highly motion-blurred objects from a video.
Experiments on benchmark datasets demonstrate that our method outperforms previous methods for fast moving object deblurring and 3D reconstruction.
arXiv Detail & Related papers (2021-11-29T11:25:14Z) - NeuralDiff: Segmenting 3D objects that move in egocentric videos [92.95176458079047]
We study the problem of decomposing the observed 3D scene into a static background and a dynamic foreground.
This task is reminiscent of the classic background subtraction problem, but is significantly harder because all parts of the scene, static and dynamic, generate a large apparent motion.
In particular, we consider egocentric videos and further separate the dynamic component into objects and the actor that observes and moves them.
arXiv Detail & Related papers (2021-10-19T12:51:35Z) - Spatiotemporal Bundle Adjustment for Dynamic 3D Human Reconstruction in
the Wild [49.672487902268706]
We present a framework that jointly estimates camera temporal alignment and 3D point triangulation.
We reconstruct 3D motion trajectories of human bodies in events captured by multiple unsynchronized and unsynchronized video cameras.
arXiv Detail & Related papers (2020-07-24T23:50:46Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.