SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware
Feature Extraction
- URL: http://arxiv.org/abs/2010.02893v3
- Date: Tue, 29 Dec 2020 07:54:37 GMT
- Title: SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware
Feature Extraction
- Authors: Jaehoon Choi, Dongki Jung, Donghwan Lee, Changick Kim
- Abstract summary: We propose SAFENet that is designed to leverage semantic information to overcome the limitations of the photometric loss.
Our key idea is to exploit semantic-aware depth features that integrate the semantic and geometric knowledge.
Experiments on KITTI dataset demonstrate that our methods compete or even outperform the state-of-the-art methods.
- Score: 27.750031877854717
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised monocular depth estimation has emerged as a promising method
because it does not require groundtruth depth maps during training. As an
alternative for the groundtruth depth map, the photometric loss enables to
provide self-supervision on depth prediction by matching the input image
frames. However, the photometric loss causes various problems, resulting in
less accurate depth values compared with supervised approaches. In this paper,
we propose SAFENet that is designed to leverage semantic information to
overcome the limitations of the photometric loss. Our key idea is to exploit
semantic-aware depth features that integrate the semantic and geometric
knowledge. Therefore, we introduce multi-task learning schemes to incorporate
semantic-awareness into the representation of depth features. Experiments on
KITTI dataset demonstrate that our methods compete or even outperform the
state-of-the-art methods. Furthermore, extensive experiments on different
datasets show its better generalization ability and robustness to various
conditions, such as low-light or adverse weather.
Related papers
- Unveiling the Depths: A Multi-Modal Fusion Framework for Challenging
Scenarios [103.72094710263656]
This paper presents a novel approach that identifies and integrates dominant cross-modality depth features with a learning-based framework.
We propose a novel confidence loss steering a confidence predictor network to yield a confidence map specifying latent potential depth areas.
With the resulting confidence map, we propose a multi-modal fusion network that fuses the final depth in an end-to-end manner.
arXiv Detail & Related papers (2024-02-19T04:39:16Z) - Unsupervised Light Field Depth Estimation via Multi-view Feature
Matching with Occlusion Prediction [15.421219881815956]
It is costly to obtain sufficient depth labels for supervised training.
In this paper, we propose an unsupervised framework to estimate depth from LF images.
arXiv Detail & Related papers (2023-01-20T06:11:17Z) - Robust Depth Completion with Uncertainty-Driven Loss Functions [60.9237639890582]
We introduce uncertainty-driven loss functions to improve the robustness of depth completion and handle the uncertainty in depth completion.
Our method has been tested on KITTI Depth Completion Benchmark and achieved the state-of-the-art robustness performance in terms of MAE, IMAE, and IRMSE metrics.
arXiv Detail & Related papers (2021-12-15T05:22:34Z) - Unsupervised Monocular Depth Perception: Focusing on Moving Objects [5.489557739480878]
In this paper, we show that deliberately manipulating photometric errors can efficiently deal with difficulties better.
We first propose an outlier masking technique that considers the occluded or dynamic pixels as statistical outliers in the photometric error map.
With the outlier masking, the network learns the depth of objects that move in the opposite direction to the camera more accurately.
arXiv Detail & Related papers (2021-08-30T08:45:02Z) - Probabilistic and Geometric Depth: Detecting Objects in Perspective [78.00922683083776]
3D object detection is an important capability needed in various practical applications such as driver assistance systems.
Monocular 3D detection, as an economical solution compared to conventional settings relying on binocular vision or LiDAR, has drawn increasing attention recently but still yields unsatisfactory results.
This paper first presents a systematic study on this problem and observes that the current monocular 3D detection problem can be simplified as an instance depth estimation problem.
arXiv Detail & Related papers (2021-07-29T16:30:33Z) - Progressive Depth Learning for Single Image Dehazing [56.71963910162241]
Existing dehazing methods often ignore the depth cues and fail in distant areas where heavier haze disturbs the visibility.
We propose a deep end-to-end model that iteratively estimates image depths and transmission maps.
Our approach benefits from explicitly modeling the inner relationship of image depth and transmission map, which is especially effective for distant hazy areas.
arXiv Detail & Related papers (2021-02-21T05:24:18Z) - Learning Depth via Leveraging Semantics: Self-supervised Monocular Depth
Estimation with Both Implicit and Explicit Semantic Guidance [34.62415122883441]
We propose a Semantic-aware Spatial Feature Alignment scheme to align implicit semantic features with depth features for scene-aware depth estimation.
We also propose a semantic-guided ranking loss to explicitly constrain the estimated depth maps to be consistent with real scene contextual properties.
Our method produces high quality depth maps which are consistently superior either on complex scenes or diverse semantic categories.
arXiv Detail & Related papers (2021-02-11T14:29:51Z) - Variational Monocular Depth Estimation for Reliability Prediction [12.951621755732544]
Self-supervised learning for monocular depth estimation is widely investigated as an alternative to supervised learning approach.
Previous works have successfully improved the accuracy of depth estimation by modifying the model structure.
In this paper, we theoretically formulate a variational model for the monocular depth estimation to predict the reliability of the estimated depth image.
arXiv Detail & Related papers (2020-11-24T06:23:51Z) - Adaptive confidence thresholding for monocular depth estimation [83.06265443599521]
We propose a new approach to leverage pseudo ground truth depth maps of stereo images generated from self-supervised stereo matching methods.
The confidence map of the pseudo ground truth depth map is estimated to mitigate performance degeneration by inaccurate pseudo depth maps.
Experimental results demonstrate superior performance to state-of-the-art monocular depth estimation methods.
arXiv Detail & Related papers (2020-09-27T13:26:16Z) - DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised
Representation Learning [65.94499390875046]
DeFeat-Net is an approach to simultaneously learn a cross-domain dense feature representation.
Our technique is able to outperform the current state-of-the-art with around 10% reduction in all error measures.
arXiv Detail & Related papers (2020-03-30T13:10:32Z) - DiPE: Deeper into Photometric Errors for Unsupervised Learning of Depth
and Ego-motion from Monocular Videos [9.255509741319583]
This paper shows that carefully manipulating photometric errors can tackle these difficulties better.
The primary improvement is achieved by a statistical technique that can mask out the invisible or nonstationary pixels in the photometric error map.
We also propose an efficient weighted multi-scale scheme to reduce the artifacts in the predicted depth maps.
arXiv Detail & Related papers (2020-03-03T07:05:15Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.