Learning Feature Descriptors using Camera Pose Supervision
- URL: http://arxiv.org/abs/2004.13324v3
- Date: Mon, 29 Jan 2024 06:01:18 GMT
- Title: Learning Feature Descriptors using Camera Pose Supervision
- Authors: Qianqian Wang, Xiaowei Zhou, Bharath Hariharan, Noah Snavely
- Abstract summary: We propose a novel weakly-supervised framework that can learn feature descriptors solely from relative camera poses between images.
Because we no longer need pixel-level ground-truth correspondences, our framework opens up the possibility of training on much larger and more diverse datasets for better and unbiased descriptors.
- Score: 101.56783569070221
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent research on learned visual descriptors has shown promising
improvements in correspondence estimation, a key component of many 3D vision
tasks. However, existing descriptor learning frameworks typically require
ground-truth correspondences between feature points for training, which are
challenging to acquire at scale. In this paper we propose a novel
weakly-supervised framework that can learn feature descriptors solely from
relative camera poses between images. To do so, we devise both a new loss
function that exploits the epipolar constraint given by camera poses, and a new
model architecture that makes the whole pipeline differentiable and efficient.
Because we no longer need pixel-level ground-truth correspondences, our
framework opens up the possibility of training on much larger and more diverse
datasets for better and unbiased descriptors. We call the resulting descriptors
CAmera Pose Supervised, or CAPS, descriptors. Though trained with weak
supervision, CAPS descriptors outperform even prior fully-supervised
descriptors and achieve state-of-the-art performance on a variety of geometric
tasks. Project Page: https://qianqianwang68.github.io/CAPS/
Related papers
- Self-supervised Learning of LiDAR 3D Point Clouds via 2D-3D Neural Calibration [107.61458720202984]
This paper introduces a novel self-supervised learning framework for enhancing 3D perception in autonomous driving scenes.
We propose the learnable transformation alignment to bridge the domain gap between image and point cloud data.
We establish dense 2D-3D correspondences to estimate the rigid pose.
arXiv Detail & Related papers (2024-01-23T02:41:06Z) - Residual Learning for Image Point Descriptors [56.917951170421894]
We propose a very simple and effective approach to learning local image descriptors by using a hand-crafted detector and descriptor.
We optimize the final descriptor by leveraging the knowledge already present in the handcrafted descriptor.
Our approach has potential applications in ensemble learning and learning with non-differentiable functions.
arXiv Detail & Related papers (2023-12-24T12:51:30Z) - Sim2Real Object-Centric Keypoint Detection and Description [40.58367357980036]
Keypoint detection and description play a central role in computer vision.
We propose the object-centric formulation, which requires further identifying which object each interest point belongs to.
We develop a sim2real contrastive learning mechanism that can generalize the model trained in simulation to real-world applications.
arXiv Detail & Related papers (2022-02-01T15:00:20Z) - Domain Adaptation of Networks for Camera Pose Estimation: Learning
Camera Pose Estimation Without Pose Labels [8.409695277909421]
One of the key criticisms of deep learning is that large amounts of expensive and difficult-to-acquire training data are required to train models.
DANCE enables the training of models without access to any labels on the target task.
renders labeled synthetic images from the 3D model, and bridges the inevitable domain gap between synthetic and real images.
arXiv Detail & Related papers (2021-11-29T17:45:38Z) - Digging Into Self-Supervised Learning of Feature Descriptors [14.47046413243358]
We propose a set of improvements that combined lead to powerful feature descriptors.
We show that increasing the search space from in-pair to in-batch for hard negative mining brings consistent improvement.
We demonstrate that a combination of synthetic homography transformation, color augmentation, and photorealistic image stylization produces useful representations.
arXiv Detail & Related papers (2021-10-10T12:22:44Z) - UPDesc: Unsupervised Point Descriptor Learning for Robust Registration [54.95201961399334]
UPDesc is an unsupervised method to learn point descriptors for robust point cloud registration.
We show that our learned descriptors yield superior performance over existing unsupervised methods.
arXiv Detail & Related papers (2021-08-05T17:11:08Z) - Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation.
We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths.
In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z) - Shape and Viewpoint without Keypoints [63.26977130704171]
We present a learning framework that learns to recover the 3D shape, pose and texture from a single image.
We trained on an image collection without any ground truth 3D shape, multi-view, camera viewpoints or keypoint supervision.
We obtain state-of-the-art camera prediction results and show that we can learn to predict diverse shapes and textures across objects.
arXiv Detail & Related papers (2020-07-21T17:58:28Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.