MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth
Estimation for Indoor Environments
- URL: http://arxiv.org/abs/2107.12429v2
- Date: Wed, 28 Jul 2021 00:32:57 GMT
- Title: MonoIndoor: Towards Good Practice of Self-Supervised Monocular Depth
Estimation for Indoor Environments
- Authors: Pan Ji, Runze Li, Bir Bhanu, Yi Xu
- Abstract summary: Self-supervised depth estimation for indoor environments is more challenging than its outdoor counterpart.
The depth range of indoor sequences varies a lot across different frames, making it difficult for the depth network to induce consistent depth cues.
The maximum distance in outdoor scenes mostly stays the same as the camera usually sees the sky.
The motions of outdoor sequences are pre-dominantly translational, especially for driving datasets such as KITTI.
- Score: 55.05401912853467
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Self-supervised depth estimation for indoor environments is more challenging
than its outdoor counterpart in at least the following two aspects: (i) the
depth range of indoor sequences varies a lot across different frames, making it
difficult for the depth network to induce consistent depth cues, whereas the
maximum distance in outdoor scenes mostly stays the same as the camera usually
sees the sky; (ii) the indoor sequences contain much more rotational motions,
which cause difficulties for the pose network, while the motions of outdoor
sequences are pre-dominantly translational, especially for driving datasets
such as KITTI. In this paper, special considerations are given to those
challenges and a set of good practices are consolidated for improving the
performance of self-supervised monocular depth estimation in indoor
environments. The proposed method mainly consists of two novel modules, \ie, a
depth factorization module and a residual pose estimation module, each of which
is designed to respectively tackle the aforementioned challenges. The
effectiveness of each module is shown through a carefully conducted ablation
study and the demonstration of the state-of-the-art performance on three indoor
datasets, \ie, EuRoC, NYUv2, and 7-scenes.
Related papers
- GAM-Depth: Self-Supervised Indoor Depth Estimation Leveraging a
Gradient-Aware Mask and Semantic Constraints [12.426365333096264]
We propose GAM-Depth, developed upon two novel components: gradient-aware mask and semantic constraints.
The gradient-aware mask enables adaptive and robust supervision for both key areas and textureless regions.
The incorporation of semantic constraints for indoor self-supervised depth estimation improves depth discrepancies at object boundaries.
arXiv Detail & Related papers (2024-02-22T07:53:34Z) - Deeper into Self-Supervised Monocular Indoor Depth Estimation [7.30562653023176]
Self-supervised learning of indoor depth from monocular sequences is quite challenging for researchers.
In this work, our proposed method, named IndoorDepth, consists of two innovations.
Experiments on the NYUv2 benchmark demonstrate that our IndoorDepth outperforms the previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-12-03T04:55:32Z) - GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for
Indoor Scenes [47.76269541664071]
This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture.
We obtain coarse camera poses from monocular sequences through multi-view geometry to deal with the former.
To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism.
arXiv Detail & Related papers (2023-09-26T17:59:57Z) - GEDepth: Ground Embedding for Monocular Depth Estimation [4.95394574147086]
This paper proposes a novel ground embedding module to decouple camera parameters from pictorial cues.
A ground attention is designed in the module to optimally combine ground depth with residual depth.
Experiments reveal that our approach achieves the state-of-the-art results on popular benchmarks.
arXiv Detail & Related papers (2023-09-18T17:56:06Z) - The Second Monocular Depth Estimation Challenge [93.1678025923996]
The second edition of the Monocular Depth Estimation Challenge (MDEC) was open to methods using any form of supervision.
The challenge was based around the SYNS-Patches dataset, which features a wide diversity of environments with high-quality dense ground-truth.
The top supervised submission improved relative F-Score by 27.62%, while the top self-supervised improved it by 16.61%.
arXiv Detail & Related papers (2023-04-14T11:10:07Z) - Multi-Camera Collaborative Depth Prediction via Consistent Structure
Estimation [75.99435808648784]
We propose a novel multi-camera collaborative depth prediction method.
It does not require large overlapping areas while maintaining structure consistency between cameras.
Experimental results on DDAD and NuScenes datasets demonstrate the superior performance of our method.
arXiv Detail & Related papers (2022-10-05T03:44:34Z) - MonoIndoor++:Towards Better Practice of Self-Supervised Monocular Depth
Estimation for Indoor Environments [45.89629401768049]
Self-supervised monocular depth estimation has seen significant progress in recent years, especially in outdoor environments.
However, depth prediction results are not satisfying in indoor scenes where most of the existing data are captured with hand-held devices.
We propose a novel framework-IndoorMono++ to improve the performance of self-supervised monocular depth estimation for indoor environments.
arXiv Detail & Related papers (2022-07-18T21:34:43Z) - SelfDeco: Self-Supervised Monocular Depth Completion in Challenging
Indoor Environments [50.761917113239996]
We present a novel algorithm for self-supervised monocular depth completion.
Our approach is based on training a neural network that requires only sparse depth measurements and corresponding monocular video sequences without dense depth labels.
Our self-supervised algorithm is designed for challenging indoor environments with textureless regions, glossy and transparent surface, non-Lambertian surfaces, moving people, longer and diverse depth ranges and scenes captured by complex ego-motions.
arXiv Detail & Related papers (2020-11-10T08:55:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.