Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular
Depth Estimation by Integrating IMU Motion Dynamics
- URL: http://arxiv.org/abs/2207.04680v1
- Date: Mon, 11 Jul 2022 07:50:22 GMT
- Title: Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular
Depth Estimation by Integrating IMU Motion Dynamics
- Authors: Sen Zhang, Jing Zhang, and Dacheng Tao
- Abstract summary: Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.
We propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics.
We validate the effectiveness of DynaDepth by conducting extensive experiments and simulations on the KITTI and Make3D datasets.
- Score: 74.1720528573331
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Unsupervised monocular depth and ego-motion estimation has drawn extensive
research attention in recent years. Although current methods have reached a
high up-to-scale accuracy, they usually fail to learn the true scale metric due
to the inherent scale ambiguity from training with monocular sequences. In this
work, we tackle this problem and propose DynaDepth, a novel scale-aware
framework that integrates information from vision and IMU motion dynamics.
Specifically, we first propose an IMU photometric loss and a cross-sensor
photometric consistency loss to provide dense supervision and absolute scales.
To fully exploit the complementary information from both sensors, we further
drive a differentiable camera-centric extended Kalman filter (EKF) to update
the IMU preintegrated motions when observing visual measurements. In addition,
the EKF formulation enables learning an ego-motion uncertainty measure, which
is non-trivial for unsupervised methods. By leveraging IMU during training,
DynaDepth not only learns an absolute scale, but also provides a better
generalization ability and robustness against vision degradation such as
illumination change and moving objects. We validate the effectiveness of
DynaDepth by conducting extensive experiments and simulations on the KITTI and
Make3D datasets.
Related papers
- Masked Video and Body-worn IMU Autoencoder for Egocentric Action Recognition [24.217068565936117]
We present a novel method for action recognition that integrates motion data from body-worn IMUs with egocentric video.
To model the complex relation of multiple IMU devices placed across the body, we exploit the collaborative dynamics in multiple IMU devices.
Experiments show our method can achieve state-of-the-art performance on multiple public datasets.
arXiv Detail & Related papers (2024-07-09T07:53:16Z) - DO3D: Self-supervised Learning of Decomposed Object-aware 3D Motion and
Depth from Monocular Videos [76.01906393673897]
We propose a self-supervised method to jointly learn 3D motion and depth from monocular videos.
Our system contains a depth estimation module to predict depth, and a new decomposed object-wise 3D motion (DO3D) estimation module to predict ego-motion and 3D object motion.
Our model delivers superior performance in all evaluated settings.
arXiv Detail & Related papers (2024-03-09T12:22:46Z) - Detaching and Boosting: Dual Engine for Scale-Invariant Self-Supervised
Monocular Depth Estimation [18.741426143836538]
We present a scale-invariant approach for self-supervised MDE, in which scale-sensitive features (SSFs) are detached away.
To be specific, a simple but effective data augmentation by imitating the camera zooming process is proposed to detach SSFs.
Our approach achieves new State-of-The-Art performance against existing works from 0.097 to 0.090 w.r.t absolute relative error.
arXiv Detail & Related papers (2022-10-08T07:38:11Z) - Transformer Inertial Poser: Attention-based Real-time Human Motion
Reconstruction from Sparse IMUs [79.72586714047199]
We propose an attention-based deep learning method to reconstruct full-body motion from six IMU sensors in real-time.
Our method achieves new state-of-the-art results both quantitatively and qualitatively, while being simple to implement and smaller in size.
arXiv Detail & Related papers (2022-03-29T16:24:52Z) - Disentangling Object Motion and Occlusion for Unsupervised Multi-frame
Monocular Depth [37.021579239596164]
Existing dynamic-object-focused methods only partially solved the mismatch problem at the training loss level.
We propose a novel multi-frame monocular depth prediction method to solve these problems at both the prediction and supervision loss levels.
Our method, called DynamicDepth, is a new framework trained via a self-supervised cycle consistent learning scheme.
arXiv Detail & Related papers (2022-03-29T01:36:11Z) - Towards Scale Consistent Monocular Visual Odometry by Learning from the
Virtual World [83.36195426897768]
We propose VRVO, a novel framework for retrieving the absolute scale from virtual data.
We first train a scale-aware disparity network using both monocular real images and stereo virtual data.
The resulting scale-consistent disparities are then integrated with a direct VO system.
arXiv Detail & Related papers (2022-03-11T01:51:54Z) - Self-Supervised Monocular Depth and Ego-Motion Estimation in Endoscopy:
Appearance Flow to the Rescue [38.168759071532676]
Self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos.
In this work, we introduce a novel concept referred to as appearance flow to address the brightness inconsistency problem.
We build a unified self-supervised framework to estimate monocular depth and ego-motion simultaneously in endoscopic scenes.
arXiv Detail & Related papers (2021-12-15T13:51:10Z) - Attentive and Contrastive Learning for Joint Depth and Motion Field
Estimation [76.58256020932312]
Estimating the motion of the camera together with the 3D structure of the scene from a monocular vision system is a complex task.
We present a self-supervised learning framework for 3D object motion field estimation from monocular videos.
arXiv Detail & Related papers (2021-10-13T16:45:01Z) - Unsupervised Scale-consistent Depth Learning from Video [131.3074342883371]
We propose a monocular depth estimator SC-Depth, which requires only unlabelled videos for training.
Thanks to the capability of scale-consistent prediction, we show that our monocular-trained deep networks are readily integrated into the ORB-SLAM2 system.
The proposed hybrid Pseudo-RGBD SLAM shows compelling results in KITTI, and it generalizes well to the KAIST dataset without additional training.
arXiv Detail & Related papers (2021-05-25T02:17:56Z) - Movement Tracking by Optical Flow Assisted Inertial Navigation [18.67291804847956]
We show how a learning-based optical flow model can be combined with conventional inertial navigation.
We show how ideas from probabilistic deep learning can aid the robustness of the measurement updates.
The practical applicability is demonstrated on real-world data acquired by an iPad.
arXiv Detail & Related papers (2020-06-24T16:36:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.