Attentive and Contrastive Learning for Joint Depth and Motion Field
Estimation
- URL: http://arxiv.org/abs/2110.06853v1
- Date: Wed, 13 Oct 2021 16:45:01 GMT
- Title: Attentive and Contrastive Learning for Joint Depth and Motion Field
Estimation
- Authors: Seokju Lee, Francois Rameau, Fei Pan, In So Kweon
- Abstract summary: Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task.
We present a self-supervised learning framework for 3D object motion field estimation from monocular videos.
- Score: 76.58256020932312
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Estimating the motion of the camera together with the 3D structure of the
scene from a monocular vision system is a complex task that often relies on the
so-called scene rigidity assumption. When observing a dynamic environment, this
assumption is violated which leads to an ambiguity between the ego-motion of
the camera and the motion of the objects. To solve this problem, we present a
self-supervised learning framework for 3D object motion field estimation from
monocular videos. Our contributions are two-fold. First, we propose a two-stage
projection pipeline to explicitly disentangle the camera ego-motion and the
object motions with dynamics attention module, called DAM. Specifically, we
design an integrated motion model that estimates the motion of the camera and
object in the first and second warping stages, respectively, controlled by the
attention module through a shared motion encoder. Second, we propose an object
motion field estimation through contrastive sample consensus, called CSAC,
taking advantage of weak semantic prior (bounding box from an object detector)
and geometric constraints (each object respects the rigid body motion model).
Experiments on KITTI, Cityscapes, and Waymo Open Dataset demonstrate the
relevance of our approach and show that our method outperforms state-of-the-art
algorithms for the tasks of self-supervised monocular depth estimation, object
motion segmentation, monocular scene flow estimation, and visual odometry.
Related papers
- Shape of Motion: 4D Reconstruction from a Single Video [51.04575075620677]
We introduce a method capable of reconstructing generic dynamic scenes, featuring explicit, full-sequence-long 3D motion.
We exploit the low-dimensional structure of 3D motion by representing scene motion with a compact set of SE3 motion bases.
Our method achieves state-of-the-art performance for both long-range 3D/2D motion estimation and novel view synthesis on dynamic scenes.
arXiv Detail & Related papers (2024-07-18T17:59:08Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - Dynamo-Depth: Fixing Unsupervised Depth Estimation for Dynamical Scenes [40.46121828229776]
Dynamo-Depth is an approach that disambiguates dynamical motion by jointly learning monocular depth, 3D independent flow field, and motion segmentation from unlabeled monocular videos.
Our proposed method achieves state-of-the-art performance on monocular depth estimation on Open and nuScenes with significant improvement in the depth of moving objects.
arXiv Detail & Related papers (2023-10-29T03:24:16Z) - Motion Segmentation from a Moving Monocular Camera [3.115818438802931]
We take advantage of two popular branches of monocular motion segmentation approaches: point trajectory based and optical flow based methods.
We are able to model various complex object motions in different scene structures at once.
Our method shows state-of-the-art performance on the KT3DMoSeg dataset.
arXiv Detail & Related papers (2023-09-24T22:59:05Z) - Decoupling Dynamic Monocular Videos for Dynamic View Synthesis [50.93409250217699]
We tackle the challenge of dynamic view synthesis from dynamic monocular videos in an unsupervised fashion.
Specifically, we decouple the motion of the dynamic objects into object motion and camera motion, respectively regularized by proposed unsupervised surface consistency and patch-based multi-view constraints.
arXiv Detail & Related papers (2023-04-04T11:25:44Z) - 3D Object Aided Self-Supervised Monocular Depth Estimation [5.579605877061333]
We propose a new method to address dynamic object movements through monocular 3D object detection.
Specifically, we first detect 3D objects in the images and build the per-pixel correspondence of the dynamic pixels with the detected object pose.
In this way, the depth of every pixel can be learned via a meaningful geometry model.
arXiv Detail & Related papers (2022-12-04T08:52:33Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - Event-based Motion Segmentation with Spatio-Temporal Graph Cuts [51.17064599766138]
We have developed a method to identify independently objects acquired with an event-based camera.
The method performs on par or better than the state of the art without having to predetermine the number of expected moving objects.
arXiv Detail & Related papers (2020-12-16T04:06:02Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.