Semantically-Guided Representation Learning for Self-Supervised
Monocular Depth
- URL: http://arxiv.org/abs/2002.12319v1
- Date: Thu, 27 Feb 2020 18:40:10 GMT
- Title: Semantically-Guided Representation Learning for Self-Supervised
Monocular Depth
- Authors: Vitor Guizilini, Rui Hou, Jie Li, Rares Ambrus, Adrien Gaidon
- Abstract summary: We propose a new architecture leveraging fixed pretrained semantic segmentation networks to guide self-supervised representation learning.
Our method improves upon the state of the art for self-supervised monocular depth prediction over all pixels, fine-grained details, and per semantic categories.
- Score: 40.49380547487908
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Self-supervised learning is showing great promise for monocular depth
estimation, using geometry as the only source of supervision. Depth networks
are indeed capable of learning representations that relate visual appearance to
3D properties by implicitly leveraging category-level patterns. In this work we
investigate how to leverage more directly this semantic structure to guide
geometric representation learning, while remaining in the self-supervised
regime. Instead of using semantic labels and proxy losses in a multi-task
approach, we propose a new architecture leveraging fixed pretrained semantic
segmentation networks to guide self-supervised representation learning via
pixel-adaptive convolutions. Furthermore, we propose a two-stage training
process to overcome a common semantic bias on dynamic objects via resampling.
Our method improves upon the state of the art for self-supervised monocular
depth prediction over all pixels, fine-grained details, and per semantic
categories.
Related papers
- S3PT: Scene Semantics and Structure Guided Clustering to Boost Self-Supervised Pre-Training for Autonomous Driving [12.406655155106424]
We propose S3PT a novel scene semantics and structure guided clustering to provide more scene-consistent objectives for self-supervised training.
Our contributions are threefold: First, we incorporate semantic distribution consistent clustering to encourage better representation of rare classes such as motorcycles or animals.
Second, we introduce object diversity consistent spatial clustering, to handle imbalanced and diverse object sizes, ranging from large background areas to small objects such as pedestrians and traffic signs.
Third, we propose a depth-guided spatial clustering to regularize learning based on geometric information of the scene, thus further refining region separation on the feature level.
arXiv Detail & Related papers (2024-10-30T15:00:06Z) - ALSO: Automotive Lidar Self-supervision by Occupancy estimation [70.70557577874155]
We propose a new self-supervised method for pre-training the backbone of deep perception models operating on point clouds.
The core idea is to train the model on a pretext task which is the reconstruction of the surface on which the 3D points are sampled.
The intuition is that if the network is able to reconstruct the scene surface, given only sparse input points, then it probably also captures some fragments of semantic information.
arXiv Detail & Related papers (2022-12-12T13:10:19Z) - Learning Invariant World State Representations with Predictive Coding [1.8963850600275547]
We develop a new predictive coding-based architecture and a hybrid fully-supervised/self-supervised learning method.
We evaluate the robustness of our model on a new synthetic dataset.
arXiv Detail & Related papers (2022-07-06T21:08:30Z) - SelfTune: Metrically Scaled Monocular Depth Estimation through
Self-Supervised Learning [53.78813049373321]
We propose a self-supervised learning method for the pre-trained supervised monocular depth networks to enable metrically scaled depth estimation.
Our approach is useful for various applications such as mobile robot navigation and is applicable to diverse environments.
arXiv Detail & Related papers (2022-03-10T12:28:42Z) - X-Distill: Improving Self-Supervised Monocular Depth via Cross-Task
Distillation [69.9604394044652]
We propose a novel method to improve the self-supervised training of monocular depth via cross-task knowledge distillation.
During training, we utilize a pretrained semantic segmentation teacher network and transfer its semantic knowledge to the depth network.
We extensively evaluate the efficacy of our proposed approach on the KITTI benchmark and compare it with the latest state of the art.
arXiv Detail & Related papers (2021-10-24T19:47:14Z) - Fine-grained Semantics-aware Representation Enhancement for
Self-supervised Monocular Depth Estimation [16.092527463250708]
We propose novel ideas to improve self-supervised monocular depth estimation.
We focus on incorporating implicit semantic knowledge into geometric representation enhancement.
We evaluate our methods on the KITTI dataset and demonstrate that our method outperforms state-of-the-art methods.
arXiv Detail & Related papers (2021-08-19T17:50:51Z) - S2R-DepthNet: Learning a Generalizable Depth-specific Structural
Representation [63.58891781246175]
Human can infer the 3D geometry of a scene from a sketch instead of a realistic image, which indicates that the spatial structure plays a fundamental role in understanding the depth of scenes.
We are the first to explore the learning of a depth-specific structural representation, which captures the essential feature for depth estimation and ignores irrelevant style information.
Our S2R-DepthNet can be well generalized to unseen real-world data directly even though it is only trained on synthetic data.
arXiv Detail & Related papers (2021-04-02T03:55:41Z) - Three Ways to Improve Semantic Segmentation with Self-Supervised Depth
Estimation [90.87105131054419]
We present a framework for semi-supervised semantic segmentation, which is enhanced by self-supervised monocular depth estimation from unlabeled image sequences.
We validate the proposed model on the Cityscapes dataset, where all three modules demonstrate significant performance gains.
arXiv Detail & Related papers (2020-12-19T21:18:03Z) - Semantics-Driven Unsupervised Learning for Monocular Depth and
Ego-Motion Estimation [33.83396613039467]
We propose a semantics-driven unsupervised learning approach for monocular depth and ego-motion estimation from videos.
Recent unsupervised learning methods employ photometric errors between synthetic view and actual image as a supervision signal for training.
arXiv Detail & Related papers (2020-06-08T05:55:07Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.