Feature-metric Loss for Self-supervised Learning of Depth and Egomotion
- URL: http://arxiv.org/abs/2007.10603v1
- Date: Tue, 21 Jul 2020 05:19:07 GMT
- Title: Feature-metric Loss for Self-supervised Learning of Depth and Egomotion
- Authors: Chang Shu, Kun Yu, Zhixiang Duan, and Kuiyuan Yang
- Abstract summary: Photometric loss is widely used for self-supervised depth and egomotion estimation.
In this work, feature-metric loss is proposed and defined on feature representation.
Comprehensive experiments and detailed analysis via visualization demonstrate the effectiveness of the proposed feature-metric loss.
- Score: 13.995413542601472
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Photometric loss is widely used for self-supervised depth and egomotion
estimation. However, the loss landscapes induced by photometric differences are
often problematic for optimization, caused by plateau landscapes for pixels in
textureless regions or multiple local minima for less discriminative pixels. In
this work, feature-metric loss is proposed and defined on feature
representation, where the feature representation is also learned in a
self-supervised manner and regularized by both first-order and second-order
derivatives to constrain the loss landscapes to form proper convergence basins.
Comprehensive experiments and detailed analysis via visualization demonstrate
the effectiveness of the proposed feature-metric loss. In particular, our
method improves state-of-the-art methods on KITTI from 0.885 to 0.925 measured
by $\delta_1$ for depth estimation, and significantly outperforms previous
method for visual odometry.
Related papers
- Single Image Depth Prediction Made Better: A Multivariate Gaussian Take [163.14849753700682]
We introduce an approach that performs continuous modeling of per-pixel depth.
Our method's accuracy (named MG) is among the top on the KITTI depth-prediction benchmark leaderboard.
arXiv Detail & Related papers (2023-03-31T16:01:03Z) - CbwLoss: Constrained Bidirectional Weighted Loss for Self-supervised
Learning of Depth and Pose [13.581694284209885]
Photometric differences are used to train neural networks for estimating depth and camera pose from unlabeled monocular videos.
In this paper, we deal with moving objects and occlusions utilizing the difference of the flow fields and depth structure generated by affine transformation and view synthesis.
We mitigate the effect of textureless regions on model optimization by measuring differences between features with more semantic and contextual information without adding networks.
arXiv Detail & Related papers (2022-12-12T12:18:24Z) - Frequency-Aware Self-Supervised Monocular Depth Estimation [41.97188738587212]
We present two versatile methods to enhance self-supervised monocular depth estimation models.
The high generalizability of our methods is achieved by solving the fundamental and ubiquitous problems in photometric loss function.
We are the first to propose blurring images to improve depth estimators with an interpretable analysis.
arXiv Detail & Related papers (2022-10-11T14:30:26Z) - DeepWSD: Projecting Degradations in Perceptual Space to Wasserstein
Distance in Deep Feature Space [67.07476542850566]
We propose to model the quality degradation in perceptual space from a statistical distribution perspective.
The quality is measured based upon the Wasserstein distance in the deep feature domain.
The deep Wasserstein distance (DeepWSD) performed on features from neural networks enjoys better interpretability of the quality contamination.
arXiv Detail & Related papers (2022-08-05T02:46:12Z) - RA-Depth: Resolution Adaptive Self-Supervised Monocular Depth Estimation [27.679479140943503]
We propose a resolution adaptive self-supervised monocular depth estimation method (RA-Depth) by learning the scale invariance of the scene depth.
RA-Depth achieves state-of-the-art performance, and also exhibits a good ability of resolution adaptation.
arXiv Detail & Related papers (2022-07-25T08:49:59Z) - Degradation-agnostic Correspondence from Resolution-asymmetric Stereo [96.03964515969652]
We study the problem of stereo matching from a pair of images with different resolutions, e.g., those acquired with a tele-wide camera system.
We propose to impose the consistency between two views in a feature space instead of the image space, named feature-metric consistency.
We find that, although a stereo matching network trained with the photometric loss is not optimal, its feature extractor can produce degradation-agnostic and matching-specific features.
arXiv Detail & Related papers (2022-04-04T12:24:34Z) - Leveraging Spatial and Photometric Context for Calibrated Non-Lambertian
Photometric Stereo [61.6260594326246]
We introduce an efficient fully-convolutional architecture that can leverage both spatial and photometric context simultaneously.
Using separable 4D convolutions and 2D heat-maps reduces the size and makes more efficient.
arXiv Detail & Related papers (2021-03-22T18:06:58Z) - Uncalibrated Neural Inverse Rendering for Photometric Stereo of General
Surfaces [103.08512487830669]
This paper presents an uncalibrated deep neural network framework for the photometric stereo problem.
Existing neural network-based methods either require exact light directions or ground-truth surface normals of the object or both.
We propose an uncalibrated neural inverse rendering approach to this problem.
arXiv Detail & Related papers (2020-12-12T10:33:08Z) - SAFENet: Self-Supervised Monocular Depth Estimation with Semantic-Aware
Feature Extraction [27.750031877854717]
We propose SAFENet that is designed to leverage semantic information to overcome the limitations of the photometric loss.
Our key idea is to exploit semantic-aware depth features that integrate the semantic and geometric knowledge.
Experiments on KITTI dataset demonstrate that our methods compete or even outperform the state-of-the-art methods.
arXiv Detail & Related papers (2020-10-06T17:22:25Z) - Deep Dimension Reduction for Supervised Representation Learning [51.10448064423656]
We propose a deep dimension reduction approach to learning representations with essential characteristics.
The proposed approach is a nonparametric generalization of the sufficient dimension reduction method.
We show that the estimated deep nonparametric representation is consistent in the sense that its excess risk converges to zero.
arXiv Detail & Related papers (2020-06-10T14:47:43Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.