Deeper into Self-Supervised Monocular Indoor Depth Estimation
- URL: http://arxiv.org/abs/2312.01283v1
- Date: Sun, 3 Dec 2023 04:55:32 GMT
- Title: Deeper into Self-Supervised Monocular Indoor Depth Estimation
- Authors: Chao Fan, Zhenyu Yin, Yue Li, Feiqing Zhang
- Abstract summary: Self-supervised learning of indoor depth from monocular sequences is quite challenging for researchers.
In this work, our proposed method, named IndoorDepth, consists of two innovations.
Experiments on the NYUv2 benchmark demonstrate that our IndoorDepth outperforms the previous state-of-the-art methods by a large margin.
- Score: 7.30562653023176
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Monocular depth estimation using Convolutional Neural Networks (CNNs) has
shown impressive performance in outdoor driving scenes. However,
self-supervised learning of indoor depth from monocular sequences is quite
challenging for researchers because of the following two main reasons. One is
the large areas of low-texture regions and the other is the complex ego-motion
on indoor training datasets. In this work, our proposed method, named
IndoorDepth, consists of two innovations. In particular, we first propose a
novel photometric loss with improved structural similarity (SSIM) function to
tackle the challenge from low-texture regions. Moreover, in order to further
mitigate the issue of inaccurate ego-motion prediction, multiple photometric
losses at different stages are used to train a deeper pose network with two
residual pose blocks. Subsequent ablation study can validate the effectiveness
of each new idea. Experiments on the NYUv2 benchmark demonstrate that our
IndoorDepth outperforms the previous state-of-the-art methods by a large
margin. In addition, we also validate the generalization ability of our method
on ScanNet dataset. Code is availabe at https://github.com/fcntes/IndoorDepth.
Related papers
- Towards Cross-View-Consistent Self-Supervised Surround Depth Estimation [9.569646683579899]
Self-Supervised Surround Depth Estimation from consecutive images offers an economical alternative.
Previous SSSDE methods have proposed different mechanisms to fuse information across images, but few of them explicitly consider the cross-view constraints.
This paper proposes an efficient and consistent pose estimation design and two loss functions to enhance cross-view consistency for SSSDE.
arXiv Detail & Related papers (2024-07-04T16:29:05Z) - GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for
Indoor Scenes [47.76269541664071]
This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture.
We obtain coarse camera poses from monocular sequences through multi-view geometry to deal with the former.
To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism.
arXiv Detail & Related papers (2023-09-26T17:59:57Z) - DevNet: Self-supervised Monocular Depth Learning via Density Volume
Construction [51.96971077984869]
Self-supervised depth learning from monocular images normally relies on the 2D pixel-wise photometric relation between temporally adjacent image frames.
This work proposes Density Volume Construction Network (DevNet), a novel self-supervised monocular depth learning framework.
arXiv Detail & Related papers (2022-09-14T00:08:44Z) - Optimization-Based Separations for Neural Networks [57.875347246373956]
We show that gradient descent can efficiently learn ball indicator functions using a depth 2 neural network with two layers of sigmoidal activations.
This is the first optimization-based separation result where the approximation benefits of the stronger architecture provably manifest in practice.
arXiv Detail & Related papers (2021-12-04T18:07:47Z) - Self-Supervised Monocular Depth Estimation of Untextured Indoor Rotated
Scenes [6.316693022958222]
Self-supervised deep learning methods have leveraged stereo images for training monocular depth estimation.
These methods do not match performance of supervised methods on indoor environments with camera rotation.
We propose a novel Filled Disparity Loss term that corrects for ambiguity of image reconstruction error loss in textureless regions.
arXiv Detail & Related papers (2021-06-24T12:27:16Z) - Sparse Auxiliary Networks for Unified Monocular Depth Prediction and
Completion [56.85837052421469]
Estimating scene geometry from data obtained with cost-effective sensors is key for robots and self-driving cars.
In this paper, we study the problem of predicting dense depth from a single RGB image with optional sparse measurements from low-cost active depth sensors.
We introduce Sparse Networks (SANs), a new module enabling monodepth networks to perform both the tasks of depth prediction and completion.
arXiv Detail & Related papers (2021-03-30T21:22:26Z) - SelfDeco: Self-Supervised Monocular Depth Completion in Challenging
Indoor Environments [50.761917113239996]
We present a novel algorithm for self-supervised monocular depth completion.
Our approach is based on training a neural network that requires only sparse depth measurements and corresponding monocular video sequences without dense depth labels.
Our self-supervised algorithm is designed for challenging indoor environments with textureless regions, glossy and transparent surface, non-Lambertian surfaces, moving people, longer and diverse depth ranges and scenes captured by complex ego-motions.
arXiv Detail & Related papers (2020-11-10T08:55:07Z) - Accurate RGB-D Salient Object Detection via Collaborative Learning [101.82654054191443]
RGB-D saliency detection shows impressive ability on some challenge scenarios.
We propose a novel collaborative learning framework where edge, depth and saliency are leveraged in a more efficient way.
arXiv Detail & Related papers (2020-07-23T04:33:36Z) - Guiding Monocular Depth Estimation Using Depth-Attention Volume [38.92495189498365]
We propose guiding depth estimation to favor planar structures that are ubiquitous especially in indoor environments.
Experiments on two popular indoor datasets, NYU-Depth-v2 and ScanNet, show that our method achieves state-of-the-art depth estimation results.
arXiv Detail & Related papers (2020-04-06T15:45:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.