GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for
Indoor Scenes
- URL: http://arxiv.org/abs/2309.16019v1
- Date: Tue, 26 Sep 2023 17:59:57 GMT
- Title: GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for
Indoor Scenes
- Authors: Chaoqiang Zhao, Matteo Poggi, Fabio Tosi, Lei Zhou, Qiyu Sun, Yang
Tang, Stefano Mattoccia
- Abstract summary: This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture.
We obtain coarse camera poses from monocular sequences through multi-view geometry to deal with the former.
To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism.
- Score: 47.76269541664071
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper tackles the challenges of self-supervised monocular depth
estimation in indoor scenes caused by large rotation between frames and low
texture. We ease the learning process by obtaining coarse camera poses from
monocular sequences through multi-view geometry to deal with the former.
However, we found that limited by the scale ambiguity across different scenes
in the training dataset, a na\"ive introduction of geometric coarse poses
cannot play a positive role in performance improvement, which is
counter-intuitive. To address this problem, we propose to refine those poses
during training through rotation and translation/scale optimization. To soften
the effect of the low texture, we combine the global reasoning of vision
transformers with an overfitting-aware, iterative self-distillation mechanism,
providing more accurate depth guidance coming from the network itself.
Experiments on NYUv2, ScanNet, 7scenes, and KITTI datasets support the
effectiveness of each component in our framework, which sets a new
state-of-the-art for indoor self-supervised monocular depth estimation, as well
as outstanding generalization ability. Code and models are available at
https://github.com/zxcqlf/GasMono
Related papers
- Deeper into Self-Supervised Monocular Indoor Depth Estimation [7.30562653023176]
Self-supervised learning of indoor depth from monocular sequences is quite challenging for researchers.
In this work, our proposed method, named IndoorDepth, consists of two innovations.
Experiments on the NYUv2 benchmark demonstrate that our IndoorDepth outperforms the previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-12-03T04:55:32Z) - SC-DepthV3: Robust Self-supervised Monocular Depth Estimation for
Dynamic Scenes [58.89295356901823]
Self-supervised monocular depth estimation has shown impressive results in static scenes.
It relies on the multi-view consistency assumption for training networks, however, that is violated in dynamic object regions.
We introduce an external pretrained monocular depth estimation model for generating single-image depth prior.
Our model can predict sharp and accurate depth maps, even when training from monocular videos of highly-dynamic scenes.
arXiv Detail & Related papers (2022-11-07T16:17:47Z) - Multi-Frame Self-Supervised Depth with Transformers [33.00363651105475]
We propose a novel transformer architecture for cost volume generation.
We use depth-discretized epipolar sampling to select matching candidates.
We refine predictions through a series of self- and cross-attention layers.
arXiv Detail & Related papers (2022-04-15T19:04:57Z) - SelfTune: Metrically Scaled Monocular Depth Estimation through
Self-Supervised Learning [53.78813049373321]
We propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation.
Our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments.
arXiv Detail & Related papers (2022-03-10T12:28:42Z) - Self-Supervised Monocular Depth Estimation of Untextured Indoor Rotated
Scenes [6.316693022958222]
Self-supervised deep learning methods have leveraged stereo images for training monocular depth estimation.
These methods do not match performance of supervised methods on indoor environments with camera rotation.
We propose a novel Filled Disparity Loss term that corrects for ambiguity of image reconstruction error loss in textureless regions.
arXiv Detail & Related papers (2021-06-24T12:27:16Z) - Learning Monocular Depth in Dynamic Scenes via Instance-Aware Projection
Consistency [114.02182755620784]
We present an end-to-end joint training framework that explicitly models 6-DoF motion of multiple dynamic objects, ego-motion and depth in a monocular camera setup without supervision.
Our framework is shown to outperform the state-of-the-art depth and motion estimation methods.
arXiv Detail & Related papers (2021-02-04T14:26:42Z) - SelfDeco: Self-Supervised Monocular Depth Completion in Challenging
Indoor Environments [50.761917113239996]
We present a novel algorithm for self-supervised monocular depth completion.
Our approach is based on training a neural network that requires only sparse depth measurements and corresponding monocular video sequences without dense depth labels.
Our self-supervised algorithm is designed for challenging indoor environments with textureless regions, glossy and transparent surface, non-Lambertian surfaces, moving people, longer and diverse depth ranges and scenes captured by complex ego-motions.
arXiv Detail & Related papers (2020-11-10T08:55:07Z) - Unsupervised Monocular Depth Learning with Integrated Intrinsics and
Spatio-Temporal Constraints [61.46323213702369]
This work presents an unsupervised learning framework that is able to predict at-scale depth maps and egomotion.
Our results demonstrate strong performance when compared to the current state-of-the-art on multiple sequences of the KITTI driving dataset.
arXiv Detail & Related papers (2020-11-02T22:26:58Z) - Self-Supervised Joint Learning Framework of Depth Estimation via
Implicit Cues [24.743099160992937]
We propose a novel self-supervised joint learning framework for depth estimation.
The proposed framework outperforms the state-of-the-art(SOTA) on KITTI and Make3D datasets.
arXiv Detail & Related papers (2020-06-17T13:56:59Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.