X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation
- URL: http://arxiv.org/abs/2110.12516v1
- Date: Sun, 24 Oct 2021 19:47:14 GMT
- Title: X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation
- Authors: Hong Cai, Janarbek Matai, Shubhankar Borse, Yizhe Zhang, Amin Ansari,
Fatih Porikli
- Abstract summary: We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
- Score: 69.9604394044652
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: In this paper, we propose a novel method, X-Distill, to improve the
self-supervised training of monocular depth via cross-task knowledge
distillation from semantic segmentation to depth estimation. More specifically,
during training, we utilize a pretrained semantic segmentation teacher network
and transfer its semantic knowledge to the depth network. In order to enable
such knowledge distillation across two different visual tasks, we introduce a
small, trainable network that translates the predicted depth map to a semantic
segmentation map, which can then be supervised by the teacher network. In this
way, this small network enables the backpropagation from the semantic
segmentation teacher's supervision to the depth network during training. In
addition, since the commonly used object classes in semantic segmentation are
not directly transferable to depth, we study the visual and geometric
characteristics of the objects and design a new way of grouping them that can
be shared by both tasks. It is noteworthy that our approach only modifies the
training process and does not incur additional computation during inference. We
extensively evaluate the efficacy of our proposed approach on the standard
KITTI benchmark and compare it with the latest state of the art. We further
test the generalizability of our approach on Make3D. Overall, the results show
that our approach significantly improves the depth estimation accuracy and
outperforms the state of the art.
Related papers
- Unsupervised Semantic Segmentation Through Depth-Guided Feature Correlation and Sampling [14.88236554564287]
In this work, we build upon advances in unsupervised learning by incorporating information about the structure of a scene into the training process.
We achieve this by (1) learning depth-feature correlation by spatially correlate the feature maps with the depth maps to induce knowledge about the structure of the scene.
We then implement farthest-point sampling to more effectively select relevant features by utilizing 3D sampling techniques on depth information of the scene.
arXiv Detail & Related papers (2023-09-21T11:47:01Z) - Towards Deeply Unified Depth-aware Panoptic Segmentation with
Bi-directional Guidance Learning [63.63516124646916]
We propose a deeply unified framework for depth-aware panoptic segmentation.
We propose a bi-directional guidance learning approach to facilitate cross-task feature learning.
Our method sets the new state of the art for depth-aware panoptic segmentation on both Cityscapes-DVPS and SemKITTI-DVPS datasets.
arXiv Detail & Related papers (2023-07-27T11:28:33Z) - Self-Supervised Monocular Depth Estimation with Internal Feature Fusion [12.874712571149725]
Self-supervised learning for depth estimation uses geometry in image sequences for supervision.
We propose a novel depth estimation networkDIFFNet, which can make use of semantic information in down and upsampling procedures.
arXiv Detail & Related papers (2021-10-18T17:31:11Z) - Learning to Relate Depth and Semantics for Unsupervised Domain
Adaptation [87.1188556802942]
We present an approach for encoding visual task relationships to improve model performance in an Unsupervised Domain Adaptation (UDA) setting.
We propose a novel Cross-Task Relation Layer (CTRL), which encodes task dependencies between the semantic and depth predictions.
Furthermore, we propose an Iterative Self-Learning (ISL) training scheme, which exploits semantic pseudo-labels to provide extra supervision on the target domain.
arXiv Detail & Related papers (2021-05-17T13:42:09Z) - Domain Adaptive Semantic Segmentation with Self-Supervised Depth
Estimation [84.34227665232281]
Domain adaptation for semantic segmentation aims to improve the model performance in the presence of a distribution shift between source and target domain.
We leverage the guidance from self-supervised depth estimation, which is available on both domains, to bridge the domain gap.
We demonstrate the effectiveness of our proposed approach on the benchmark tasks SYNTHIA-to-Cityscapes and GTA-to-Cityscapes.
arXiv Detail & Related papers (2021-04-28T07:47:36Z) - MaAST: Map Attention with Semantic Transformersfor Efficient Visual
Navigation [4.127128889779478]
This work focuses on performing better or comparable to the existing learning-based solutions for visual navigation for autonomous agents.
We propose a method to encode vital scene semantics into a semantically informed, top-down egocentric map representation.
We conduct experiments on 3-D reconstructed indoor PointGoal visual navigation and demonstrate the effectiveness of our approach.
arXiv Detail & Related papers (2021-03-21T12:01:23Z) - SOSD-Net: Joint Semantic Object Segmentation and Depth Estimation from
Monocular images [94.36401543589523]
We introduce the concept of semantic objectness to exploit the geometric relationship of these two tasks.
We then propose a Semantic Object and Depth Estimation Network (SOSD-Net) based on the objectness assumption.
To the best of our knowledge, SOSD-Net is the first network that exploits the geometry constraint for simultaneous monocular depth estimation and semantic segmentation.
arXiv Detail & Related papers (2021-01-19T02:41:03Z) - Three Ways to Improve Semantic Segmentation with Self-Supervised Depth
Estimation [90.87105131054419]
We present a framework for semi-supervised semantic segmentation, which is enhanced by self-supervised monocular depth estimation from unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset, where all three modules demonstrate significant performance gains.
arXiv Detail & Related papers (2020-12-19T21:18:03Z) - Semantics-Driven Unsupervised Learning for Monocular Depth and
Ego-Motion Estimation [33.83396613039467]
We propose a semantics-driven unsupervised learning approach for monocular depth and ego-motion estimation from videos.
Recent unsupervised learning methods employ photometric errors between synthetic view and actual image as a supervision signal for training.
arXiv Detail & Related papers (2020-06-08T05:55:07Z) - Semantically-Guided Representation Learning for Self-Supervised
Monocular Depth [40.49380547487908]
We propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning.
Our method improves upon the state of the art for self-supervised monocular depth prediction over all pixels, fine-grained details, and per semantic categories.
arXiv Detail & Related papers (2020-02-27T18:40:10Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.