Fine-grained Semantics-aware Representation Enhancement for
Self-supervised Monocular Depth Estimation
- URL: http://arxiv.org/abs/2108.08829v1
- Date: Thu, 19 Aug 2021 17:50:51 GMT
- Title: Fine-grained Semantics-aware Representation Enhancement for
Self-supervised Monocular Depth Estimation
- Authors: Hyunyoung Jung, Eunhyeok Park, Sungjoo Yoo
- Abstract summary: We propose novel ideas to improve self-supervised monocular depth estimation.
We focus on incorporating implicit semantic knowledge into geometric representation enhancement.
We evaluate our methods on the KITTI dataset and demonstrate that our method outperforms state-of-the-art methods.
- Score: 16.092527463250708
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised monocular depth estimation has been widely studied, owing to
its practical importance and recent promising improvements. However, most works
suffer from limited supervision of photometric consistency, especially in weak
texture regions and at object boundaries. To overcome this weakness, we propose
novel ideas to improve self-supervised monocular depth estimation by leveraging
cross-domain information, especially scene semantics. We focus on incorporating
implicit semantic knowledge into geometric representation enhancement and
suggest two ideas: a metric learning approach that exploits the
semantics-guided local geometry to optimize intermediate depth representations
and a novel feature fusion module that judiciously utilizes cross-modality
between two heterogeneous feature representations. We comprehensively evaluate
our methods on the KITTI dataset and demonstrate that our method outperforms
state-of-the-art methods. The source code is available at
https://github.com/hyBlue/FSRE-Depth.
Related papers
- Weakly-supervised Contrastive Learning for Unsupervised Object Discovery [52.696041556640516]
Unsupervised object discovery is promising due to its ability to discover objects in a generic manner.
We design a semantic-guided self-supervised learning model to extract high-level semantic features from images.
We introduce Principal Component Analysis (PCA) to localize object regions.
arXiv Detail & Related papers (2023-07-07T04:03:48Z) - ROIFormer: Semantic-Aware Region of Interest Transformer for Efficient
Self-Supervised Monocular Depth Estimation [6.923035780685481]
We propose an efficient local adaptive attention method for geometric aware representation enhancement.
We leverage geometric cues from semantic information to learn local adaptive bounding boxes to guide unsupervised feature aggregation.
Our proposed method establishes a new state-of-the-art in self-supervised monocular depth estimation task.
arXiv Detail & Related papers (2022-12-12T06:38:35Z) - X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation [69.9604394044652]
We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
arXiv Detail & Related papers (2021-10-24T19:47:14Z) - Self-Supervised Monocular Depth Estimation with Internal Feature Fusion [12.874712571149725]
Self-supervised learning for depth estimation uses geometry in image sequences for supervision.
We propose a novel depth estimation networkDIFFNet, which can make use of semantic information in down and upsampling procedures.
arXiv Detail & Related papers (2021-10-18T17:31:11Z) - Towards Interpretable Deep Networks for Monocular Depth Estimation [78.84690613778739]
We quantify the interpretability of a deep MDE network by the depth selectivity of its hidden units.
We propose a method to train interpretable MDE deep networks without changing their original architectures.
Experimental results demonstrate that our method is able to enhance the interpretability of deep MDE networks.
arXiv Detail & Related papers (2021-08-11T16:43:45Z) - Depth-conditioned Dynamic Message Propagation for Monocular 3D Object
Detection [86.25022248968908]
We learn context- and depth-aware feature representation to solve the problem of monocular 3D object detection.
We show state-of-the-art results among the monocular-based approaches on the KITTI benchmark dataset.
arXiv Detail & Related papers (2021-03-30T16:20:24Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - Semantic-Guided Representation Enhancement for Self-supervised Monocular
Trained Depth Estimation [39.845944724079814]
Self-supervised depth estimation has shown its great effectiveness in producing high quality depth maps given only image sequences as input.
However, its performance usually drops when estimating on border areas or objects with thin structures due to the limited depth representation ability.
We propose a semantic-guided depth representation enhancement method, which promotes both local and global depth feature representations.
arXiv Detail & Related papers (2020-12-15T02:24:57Z) - The Edge of Depth: Explicit Constraints between Segmentation and Depth [25.232436455640716]
We study the mutual benefits of two common computer vision tasks, self-supervised depth estimation and semantic segmentation from images.
We propose to explicitly measure the border consistency between segmentation and depth and minimize it.
Through extensive experiments, our proposed approach advances the state of the art on unsupervised monocular depth estimation in the KITTI.
arXiv Detail & Related papers (2020-04-01T00:03:20Z) - DeFeat-Net: General Monocular Depth via Simultaneous Unsupervised
Representation Learning [65.94499390875046]
DeFeat-Net is an approach to simultaneously learn a cross-domain dense feature representation.
Our technique is able to outperform the current state-of-the-art with around 10% reduction in all error measures.
arXiv Detail & Related papers (2020-03-30T13:10:32Z) - Semantically-Guided Representation Learning for Self-Supervised
Monocular Depth [40.49380547487908]
We propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning.
Our method improves upon the state of the art for self-supervised monocular depth prediction over all pixels, fine-grained details, and per semantic categories.
arXiv Detail & Related papers (2020-02-27T18:40:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.