Self-Supervised Monocular Depth Estimation of Untextured Indoor Rotated
Scenes
- URL: http://arxiv.org/abs/2106.12958v2
- Date: Fri, 25 Jun 2021 12:11:18 GMT
- Title: Self-Supervised Monocular Depth Estimation of Untextured Indoor Rotated
Scenes
- Authors: Benjamin Keltjens and Tom van Dijk and Guido de Croon
- Abstract summary: Self-supervised deep learning methods have leveraged stereo images for training monocular depth estimation.
These methods do not match performance of supervised methods on indoor environments with camera rotation.
We propose a novel Filled Disparity Loss term that corrects for ambiguity of image reconstruction error loss in textureless regions.
- Score: 6.316693022958222
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised deep learning methods have leveraged stereo images for
training monocular depth estimation. Although these methods show strong results
on outdoor datasets such as KITTI, they do not match performance of supervised
methods on indoor environments with camera rotation. Indoor, rotated scenes are
common for less constrained applications and pose problems for two reasons:
abundance of low texture regions and increased complexity of depth cues for
images under rotation. In an effort to extend self-supervised learning to more
generalised environments we propose two additions. First, we propose a novel
Filled Disparity Loss term that corrects for ambiguity of image reconstruction
error loss in textureless regions. Specifically, we interpolate disparity in
untextured regions, using the estimated disparity from surrounding textured
areas, and use L1 loss to correct the original estimation. Our experiments show
that depth estimation is substantially improved on low-texture scenes, without
any loss on textured scenes, when compared to Monodepth by Godard et al.
Secondly, we show that training with an application's representative rotations,
in both pitch and roll, is sufficient to significantly improve performance over
the entire range of expected rotation. We demonstrate that depth estimation is
successfully generalised as performance is not lost when evaluated on test sets
with no camera rotation. Together these developments enable a broader use of
self-supervised learning of monocular depth estimation for complex
environments.
Related papers
- Depth-aware Volume Attention for Texture-less Stereo Matching [67.46404479356896]
We propose a lightweight volume refinement scheme to tackle the texture deterioration in practical outdoor scenarios.
We introduce a depth volume supervised by the ground-truth depth map, capturing the relative hierarchy of image texture.
Local fine structure and context are emphasized to mitigate ambiguity and redundancy during volume aggregation.
arXiv Detail & Related papers (2024-02-14T04:07:44Z) - Deeper into Self-Supervised Monocular Indoor Depth Estimation [7.30562653023176]
Self-supervised learning of indoor depth from monocular sequences is quite challenging for researchers.
In this work, our proposed method, named IndoorDepth, consists of two innovations.
Experiments on the NYUv2 benchmark demonstrate that our IndoorDepth outperforms the previous state-of-the-art methods by a large margin.
arXiv Detail & Related papers (2023-12-03T04:55:32Z) - GasMono: Geometry-Aided Self-Supervised Monocular Depth Estimation for
Indoor Scenes [47.76269541664071]
This paper tackles the challenges of self-supervised monocular depth estimation in indoor scenes caused by large rotation between frames and low texture.
We obtain coarse camera poses from monocular sequences through multi-view geometry to deal with the former.
To soften the effect of the low texture, we combine the global reasoning of vision transformers with an overfitting-aware, iterative self-distillation mechanism.
arXiv Detail & Related papers (2023-09-26T17:59:57Z) - DARF: Depth-Aware Generalizable Neural Radiance Field [51.29437249009986]
We propose the Depth-Aware Generalizable Neural Radiance Field (DARF) with a Depth-Aware Dynamic Sampling (DADS) strategy.
Our framework infers the unseen scenes on both pixel level and geometry level with only a few input images.
Compared with state-of-the-art generalizable NeRF methods, DARF reduces samples by 50%, while improving rendering quality and depth estimation.
arXiv Detail & Related papers (2022-12-05T14:00:59Z) - Towards Accurate Reconstruction of 3D Scene Shape from A Single
Monocular Image [91.71077190961688]
We propose a two-stage framework that first predicts depth up to an unknown scale and shift from a single monocular image.
We then exploits 3D point cloud data to predict the depth shift and the camera's focal length that allow us to recover 3D scene shapes.
We test our depth model on nine unseen datasets and achieve state-of-the-art performance on zero-shot evaluation.
arXiv Detail & Related papers (2022-08-28T16:20:14Z) - Weakly-Supervised Monocular Depth Estimationwith Resolution-Mismatched
Data [73.9872931307401]
We propose a novel weakly-supervised framework to train a monocular depth estimation network.
The proposed framework is composed of a sharing weight monocular depth estimation network and a depth reconstruction network for distillation.
Experimental results demonstrate that our method achieves superior performance than unsupervised and semi-supervised learning based schemes.
arXiv Detail & Related papers (2021-09-23T18:04:12Z) - Regularizing Nighttime Weirdness: Efficient Self-supervised Monocular
Depth Estimation in the Dark [20.66405067066299]
We introduce Priors-Based Regularization to learn distribution knowledge from unpaired depth maps.
We also leverage Mapping-Consistent Image Enhancement module to enhance image visibility and contrast.
Our framework achieves remarkable improvements and state-of-the-art results on two nighttime datasets.
arXiv Detail & Related papers (2021-08-09T06:24:35Z) - SelfDeco: Self-Supervised Monocular Depth Completion in Challenging
Indoor Environments [50.761917113239996]
We present a novel algorithm for self-supervised monocular depth completion.
Our approach is based on training a neural network that requires only sparse depth measurements and corresponding monocular video sequences without dense depth labels.
Our self-supervised algorithm is designed for challenging indoor environments with textureless regions, glossy and transparent surface, non-Lambertian surfaces, moving people, longer and diverse depth ranges and scenes captured by complex ego-motions.
arXiv Detail & Related papers (2020-11-10T08:55:07Z) - Deep Depth Estimation from Visual-Inertial SLAM [11.814395824799988]
We study the case in which the sparse depth is computed from a visual-inertial simultaneous localization and mapping (VI-SLAM) system.
The resulting point cloud has low density, it is noisy, and has non-uniform spatial distribution.
We use the available gravity estimate from the VI-SLAM to warp the input image to the orientation prevailing in the training dataset.
arXiv Detail & Related papers (2020-07-31T21:28:25Z) - Unsupervised Learning of Depth, Optical Flow and Pose with Occlusion
from 3D Geometry [29.240108776329045]
In this paper, pixels in the middle frame are modeled into three parts: the rigid region, the non-rigid region, and the occluded region.
In joint unsupervised training of depth and pose, we can segment the occluded region explicitly.
In the occluded region, as depth and camera motion can provide more reliable motion estimation, they can be used to instruct unsupervised learning of optical flow.
arXiv Detail & Related papers (2020-03-02T11:18:13Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.