SelfOdom: Self-supervised Egomotion and Depth Learning via
Bi-directional Coarse-to-Fine Scale Recovery
- URL: http://arxiv.org/abs/2211.08904v2
- Date: Sat, 2 Sep 2023 16:57:31 GMT
- Title: SelfOdom: Self-supervised Egomotion and Depth Learning via
Bi-directional Coarse-to-Fine Scale Recovery
- Authors: Hao Qu, Lilian Zhang, Xiaoping Hu, Xiaofeng He, Xianfei Pan, Changhao
Chen
- Abstract summary: SelfOdom is a self-supervised dual-network framework for learning pose and depth estimates from monocular images.
We introduce a novel coarse-to-fine training strategy that enables the metric scale to be recovered in a two-stage process.
Our model excels in both normal and challenging lighting conditions, including difficult night scenes.
- Score: 12.791122117651273
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Accurately perceiving location and scene is crucial for autonomous driving
and mobile robots. Recent advances in deep learning have made it possible to
learn egomotion and depth from monocular images in a self-supervised manner,
without requiring highly precise labels to train the networks. However,
monocular vision methods suffer from a limitation known as scale-ambiguity,
which restricts their application when absolute-scale is necessary. To address
this, we propose SelfOdom, a self-supervised dual-network framework that can
robustly and consistently learn and generate pose and depth estimates in global
scale from monocular images. In particular, we introduce a novel coarse-to-fine
training strategy that enables the metric scale to be recovered in a two-stage
process. Furthermore, SelfOdom is flexible and can incorporate inertial data
with images, which improves its robustness in challenging scenarios, using an
attention-based fusion module. Our model excels in both normal and challenging
lighting conditions, including difficult night scenes. Extensive experiments on
public datasets have demonstrated that SelfOdom outperforms representative
traditional and learning-based VO and VIO models.
Related papers
- NimbleD: Enhancing Self-supervised Monocular Depth Estimation with Pseudo-labels and Large-scale Video Pre-training [2.4240014793575138]
We introduce NimbleD, an efficient self-supervised monocular depth estimation learning framework.
This framework does not require camera intrinsics, enabling large-scale pre-training on publicly available videos.
arXiv Detail & Related papers (2024-08-26T10:50:14Z) - Self-STORM: Deep Unrolled Self-Supervised Learning for Super-Resolution Microscopy [55.2480439325792]
We introduce deep unrolled self-supervised learning, which alleviates the need for such data by training a sequence-specific, model-based autoencoder.
Our proposed method exceeds the performance of its supervised counterparts.
arXiv Detail & Related papers (2024-03-25T17:40:32Z) - GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for
Indoor Scenes [47.76269541664071]
This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture.
We obtain coarse camera poses from monocular sequences through multi-view geometry to deal with the former.
To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism.
arXiv Detail & Related papers (2023-09-26T17:59:57Z) - Self-Supervised Multi-Object Tracking For Autonomous Driving From
Consistency Across Timescales [53.55369862746357]
Self-supervised multi-object trackers have tremendous potential as they enable learning from raw domain-specific data.
However, their re-identification accuracy still falls short compared to their supervised counterparts.
We propose a training objective that enables self-supervised learning of re-identification features from multiple sequential frames.
arXiv Detail & Related papers (2023-04-25T20:47:29Z) - Towards Scale-Aware, Robust, and Generalizable Unsupervised Monocular
Depth Estimation by Integrating IMU Motion Dynamics [74.1720528573331]
Unsupervised monocular depth and ego-motion estimation has drawn extensive research attention in recent years.
We propose DynaDepth, a novel scale-aware framework that integrates information from vision and IMU motion dynamics.
We validate the effectiveness of DynaDepth by conducting extensive experiments and simulations on the KITTI and Make3D datasets.
arXiv Detail & Related papers (2022-07-11T07:50:22Z) - SelfD: Self-Learning Large-Scale Driving Policies From the Web [13.879536370173506]
SelfD is a framework for learning scalable driving by utilizing large amounts of online monocular images.
We employ a large dataset of publicly available YouTube videos to train SelfD and comprehensively analyze its generalization benefits across challenging navigation scenarios.
arXiv Detail & Related papers (2022-04-21T17:58:36Z) - SelfTune: Metrically Scaled Monocular Depth Estimation through
Self-Supervised Learning [53.78813049373321]
We propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation.
Our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments.
arXiv Detail & Related papers (2022-03-10T12:28:42Z) - Self-Supervised Monocular Depth and Ego-Motion Estimation in Endoscopy:
Appearance Flow to the Rescue [38.168759071532676]
Self-supervised learning technology has been applied to calculate depth and ego-motion from monocular videos.
In this work, we introduce a novel concept referred to as appearance flow to address the brightness inconsistency problem.
We build a unified self-supervised framework to estimate monocular depth and ego-motion simultaneously in endoscopic scenes.
arXiv Detail & Related papers (2021-12-15T13:51:10Z) - MELD: Meta-Reinforcement Learning from Images via Latent State Models [109.1664295663325]
We develop an algorithm for meta-RL from images that performs inference in a latent state model to quickly acquire new skills.
MELD is the first meta-RL algorithm trained in a real-world robotic control setting from images.
arXiv Detail & Related papers (2020-10-26T23:50:30Z) - Calibrating Self-supervised Monocular Depth Estimation [77.77696851397539]
In the recent years, many methods demonstrated the ability of neural networks to learn depth and pose changes in a sequence of images, using only self-supervision as the training signal.
We show that incorporating prior information about the camera configuration and the environment, we can remove the scale ambiguity and predict depth directly, still using the self-supervised formulation and not relying on any additional sensors.
arXiv Detail & Related papers (2020-09-16T14:35:45Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.