Learning Depth via Leveraging Semantics: Self-supervised Monocular Depth
Estimation with Both Implicit and Explicit Semantic Guidance
- URL: http://arxiv.org/abs/2102.06685v1
- Date: Thu, 11 Feb 2021 14:29:51 GMT
- Title: Learning Depth via Leveraging Semantics: Self-supervised Monocular Depth
Estimation with Both Implicit and Explicit Semantic Guidance
- Authors: Rui Li, Xiantuo He, Danna Xue, Shaolin Su, Qing Mao, Yu Zhu, Jinqiu
Sun, Yanning Zhang
- Abstract summary: We propose a Semantic-aware Spatial Feature Alignment scheme to align implicit semantic features with depth features for scene-aware depth estimation.
We also propose a semantic-guided ranking loss to explicitly constrain the estimated depth maps to be consistent with real scene contextual properties.
Our method produces high quality depth maps which are consistently superior either on complex scenes or diverse semantic categories.
- Score: 34.62415122883441
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised depth estimation has made a great success in learning depth
from unlabeled image sequences. While the mappings between image and pixel-wise
depth are well-studied in current methods, the correlation between image, depth
and scene semantics, however, is less considered. This hinders the network to
better understand the real geometry of the scene, since the contextual clues,
contribute not only the latent representations of scene depth, but also the
straight constraints for depth map. In this paper, we leverage the two benefits
by proposing the implicit and explicit semantic guidance for accurate
self-supervised depth estimation. We propose a Semantic-aware Spatial Feature
Alignment (SSFA) scheme to effectively align implicit semantic features with
depth features for scene-aware depth estimation. We also propose a
semantic-guided ranking loss to explicitly constrain the estimated depth maps
to be consistent with real scene contextual properties. Both semantic label
noise and prediction uncertainty is considered to yield reliable depth
supervisions. Extensive experimental results show that our method produces high
quality depth maps which are consistently superior either on complex scenes or
diverse semantic categories, and outperforms the state-of-the-art methods by a
significant margin.
Related papers
- Depth-guided Texture Diffusion for Image Semantic Segmentation [47.46257473475867]
We introduce a Depth-guided Texture Diffusion approach that effectively tackles the outlined challenge.
Our method extracts low-level features from edges and textures to create a texture image.
By integrating this enriched depth map with the original RGB image into a joint feature embedding, our method effectively bridges the disparity between the depth map and the image.
arXiv Detail & Related papers (2024-08-17T04:55:03Z) - ScaleDepth: Decomposing Metric Depth Estimation into Scale Prediction and Relative Depth Estimation [62.600382533322325]
We propose a novel monocular depth estimation method called ScaleDepth.
Our method decomposes metric depth into scene scale and relative depth, and predicts them through a semantic-aware scale prediction module.
Our method achieves metric depth estimation for both indoor and outdoor scenes in a unified framework.
arXiv Detail & Related papers (2024-07-11T05:11:56Z) - Depth-aware Volume Attention for Texture-less Stereo Matching [67.46404479356896]
We propose a lightweight volume refinement scheme to tackle the texture deterioration in practical outdoor scenarios.
We introduce a depth volume supervised by the ground-truth depth map, capturing the relative hierarchy of image texture.
Local fine structure and context are emphasized to mitigate ambiguity and redundancy during volume aggregation.
arXiv Detail & Related papers (2024-02-14T04:07:44Z) - Learning Occlusion-Aware Coarse-to-Fine Depth Map for Self-supervised
Monocular Depth Estimation [11.929584800629673]
We propose a novel network to learn an Occlusion-aware Coarse-to-Fine Depth map for self-supervised monocular depth estimation.
The proposed OCFD-Net does not only employ a discrete depth constraint for learning a coarse-level depth map, but also employ a continuous depth constraint for learning a scene depth residual.
arXiv Detail & Related papers (2022-03-21T12:43:42Z) - X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation [69.9604394044652]
We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
arXiv Detail & Related papers (2021-10-24T19:47:14Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Semantic-Guided Representation Enhancement for Self-supervised Monocular
Trained Depth Estimation [39.845944724079814]
Self-supervised depth estimation has shown its great effectiveness in producing high quality depth maps given only image sequences as input.
However, its performance usually drops when estimating on border areas or objects with thin structures due to the limited depth representation ability.
We propose a semantic-guided depth representation enhancement method, which promotes both local and global depth feature representations.
arXiv Detail & Related papers (2020-12-15T02:24:57Z) - SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware
Feature Extraction [27.750031877854717]
We propose SAFENet that is designed to leverage semantic information to overcome the limitations of the photometric loss.
Our key idea is to exploit semantic-aware depth features that integrate the semantic and geometric knowledge.
Experiments on KITTI dataset demonstrate that our methods compete or even outperform the state-of-the-art methods.
arXiv Detail & Related papers (2020-10-06T17:22:25Z) - Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods.
The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps.
Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z) - The Edge of Depth: Explicit Constraints between Segmentation and Depth [25.232436455640716]
We study the mutual benefits of two common computer vision tasks, self-supervised depth estimation and semantic segmentation from images.
We propose to explicitly measure the border consistency between segmentation and depth and minimize it.
Through extensive experiments, our proposed approach advances the state of the art on unsupervised monocular depth estimation in the KITTI.
arXiv Detail & Related papers (2020-04-01T00:03:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.