Towards Keypoint Guided Self-Supervised Depth Estimation
- URL: http://arxiv.org/abs/2011.03091v1
- Date: Thu, 5 Nov 2020 20:45:03 GMT
- Title: Towards Keypoint Guided Self-Supervised Depth Estimation
- Authors: Kristijan Bartol and David Bojanic and Tomislav Petkovic and Tomislav
Pribanic and Yago Diez Donoso
- Abstract summary: We use keypoints as a self-supervision clue for learning depth map estimation from a collection of input images.
By learning a deep model with and without the keypoint extraction technique, we show that using the keypoints improve the depth estimation learning.
- Score: 0.0
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: This paper proposes to use keypoints as a self-supervision clue for learning
depth map estimation from a collection of input images. As ground truth depth
from real images is difficult to obtain, there are many unsupervised and
self-supervised approaches to depth estimation that have been proposed. Most of
these unsupervised approaches use depth map and ego-motion estimations to
reproject the pixels from the current image into the adjacent image from the
image collection. Depth and ego-motion estimations are evaluated based on pixel
intensity differences between the correspondent original and reprojected
pixels. Instead of reprojecting the individual pixels, we propose to first
select image keypoints in both images and then reproject and compare the
correspondent keypoints of the two images. The keypoints should describe the
distinctive image features well. By learning a deep model with and without the
keypoint extraction technique, we show that using the keypoints improve the
depth estimation learning. We also propose some future directions for
keypoint-guided learning of structure-from-motion problems.
Related papers
- Self-Supervised Keypoint Detection with Distilled Depth Keypoint Representation [0.8136541584281987]
Distill-DKP is a novel cross-modal knowledge distillation framework for keypoint detection in a self-supervised setting.
During training, Distill-DKP extracts embedding-level knowledge from a depth-based teacher model to guide an image-based student model.
Experiments show that Distill-DKP significantly outperforms previous unsupervised methods.
arXiv Detail & Related papers (2024-10-04T22:14:08Z) - GMM-IKRS: Gaussian Mixture Models for Interpretable Keypoint Refinement and Scoring [9.322937309882022]
Keypoints come with a score permitting to rank them according to their quality.
While learned keypoints often exhibit better properties than handcrafted ones, their scores are not easily interpretable.
We propose a framework that can refine, and at the same time characterize with an interpretable score, the keypoints extracted by any method.
arXiv Detail & Related papers (2024-08-30T09:39:59Z) - Pixel-level Correspondence for Self-Supervised Learning from Video [56.24439897867531]
Pixel-level Correspondence (PiCo) is a method for dense contrastive learning from video.
We validate PiCo on standard benchmarks, outperforming self-supervised baselines on multiple dense prediction tasks.
arXiv Detail & Related papers (2022-07-08T12:50:13Z) - Self-Supervised Equivariant Learning for Oriented Keypoint Detection [35.94215211409985]
We introduce a self-supervised learning framework using rotation-equivariant CNNs to learn to detect robust oriented keypoints.
We propose a dense orientation alignment loss by an image pair generated by synthetic transformations for training a histogram-based orientation map.
Our method outperforms the previous methods on an image matching benchmark and a camera pose estimation benchmark.
arXiv Detail & Related papers (2022-04-19T02:26:07Z) - Weakly Supervised Keypoint Discovery [27.750244813890262]
We propose a method for keypoint discovery from a 2D image using image-level supervision.
Motivated by the weakly-supervised learning approach, our method exploits image-level supervision to identify discriminative parts.
Our approach achieves state-of-the-art performance for the task of keypoint estimation on the limited supervision scenarios.
arXiv Detail & Related papers (2021-09-28T01:26:53Z) - Pixel-Perfect Structure-from-Motion with Featuremetric Refinement [96.73365545609191]
We refine two key steps of structure-from-motion by a direct alignment of low-level image information from multiple views.
This significantly improves the accuracy of camera poses and scene geometry for a wide range of keypoint detectors.
Our system easily scales to large image collections, enabling pixel-perfect crowd-sourced localization at scale.
arXiv Detail & Related papers (2021-08-18T17:58:55Z) - End-to-End Learning of Keypoint Representations for Continuous Control
from Images [84.8536730437934]
We show that it is possible to learn efficient keypoint representations end-to-end, without the need for unsupervised pre-training, decoders, or additional losses.
Our proposed architecture consists of a differentiable keypoint extractor that feeds the coordinates directly to a soft actor-critic agent.
arXiv Detail & Related papers (2021-06-15T09:17:06Z) - Bottom-Up Human Pose Estimation by Ranking Heatmap-Guided Adaptive
Keypoint Estimates [76.51095823248104]
We present several schemes that are rarely or unthoroughly studied before for improving keypoint detection and grouping (keypoint regression) performance.
First, we exploit the keypoint heatmaps for pixel-wise keypoint regression instead of separating them for improving keypoint regression.
Second, we adopt a pixel-wise spatial transformer network to learn adaptive representations for handling the scale and orientation variance.
Third, we present a joint shape and heatvalue scoring scheme to promote the estimated poses that are more likely to be true poses.
arXiv Detail & Related papers (2020-06-28T01:14:59Z) - Distilling Localization for Self-Supervised Representation Learning [82.79808902674282]
Contrastive learning has revolutionized unsupervised representation learning.
Current contrastive models are ineffective at localizing the foreground object.
We propose a data-driven approach for learning in variance to backgrounds.
arXiv Detail & Related papers (2020-04-14T16:29:42Z) - Simple Multi-Resolution Representation Learning for Human Pose
Estimation [2.1904965822605433]
The accuracy of human keypoint prediction is increasingly improved thanks to the development of deep learning.
We introduce novel network structures referred to as multi-resolution representation learning for human keypoint prediction.
Our architectures are simple yet effective, achieving good performance.
arXiv Detail & Related papers (2020-04-14T09:03:16Z) - Single Image Depth Estimation Trained via Depth from Defocus Cues [105.67073923825842]
Estimating depth from a single RGB image is a fundamental task in computer vision.
In this work, we rely, instead of different views, on depth from focus cues.
We present results that are on par with supervised methods on KITTI and Make3D datasets and outperform unsupervised learning approaches.
arXiv Detail & Related papers (2020-01-14T20:22:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.