SiLK -- Simple Learned Keypoints
- URL: http://arxiv.org/abs/2304.06194v1
- Date: Wed, 12 Apr 2023 23:56:00 GMT
- Title: SiLK -- Simple Learned Keypoints
- Authors: Pierre Gleize, Weiyao Wang, Matt Feiszli
- Abstract summary: Keypoint detection & descriptors are foundational tech-nologies for computer vision tasks like image matching, 3D reconstruction and visual odometry.
Recent learning-based methods employ a vast diversity of experimental setups and design choices.
We re-design each component from first-principle and propose Simple Learned Keypoints (SiLK) that is fully-differentiable, lightweight, and flexible.
- Score: 11.208547877814574
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Keypoint detection & descriptors are foundational tech-nologies for computer
vision tasks like image matching, 3D reconstruction and visual odometry.
Hand-engineered methods like Harris corners, SIFT, and HOG descriptors have
been used for decades; more recently, there has been a trend to introduce
learning in an attempt to improve keypoint detectors. On inspection however,
the results are difficult to interpret; recent learning-based methods employ a
vast diversity of experimental setups and design choices: empirical results are
often reported using different backbones, protocols, datasets, types of
supervisions or tasks. Since these differences are often coupled together, it
raises a natural question on what makes a good learned keypoint detector. In
this work, we revisit the design of existing keypoint detectors by
deconstructing their methodologies and identifying the key components. We
re-design each component from first-principle and propose Simple Learned
Keypoints (SiLK) that is fully-differentiable, lightweight, and flexible.
Despite its simplicity, SiLK advances new state-of-the-art on Detection
Repeatability and Homography Estimation tasks on HPatches and 3D Point-Cloud
Registration task on ScanNet, and achieves competitive performance to
state-of-the-art on camera pose estimation in 2022 Image Matching Challenge and
ScanNet.
Related papers
- Open-Vocabulary Animal Keypoint Detection with Semantic-feature Matching [74.75284453828017]
Open-Vocabulary Keypoint Detection (OVKD) task is innovatively designed to use text prompts for identifying arbitrary keypoints across any species.
We have developed a novel framework named Open-Vocabulary Keypoint Detection with Semantic-feature Matching (KDSM)
This framework combines vision and language models, creating an interplay between language features and local keypoint visual features.
arXiv Detail & Related papers (2023-10-08T07:42:41Z) - Enhancing Deformable Local Features by Jointly Learning to Detect and
Describe Keypoints [8.390939268280235]
Local feature extraction is a standard approach in computer vision for tackling important tasks such as image matching and retrieval.
We propose DALF, a novel deformation-aware network for jointly detecting and describing keypoints.
Our approach also enhances the performance of two real-world applications: deformable object retrieval and non-rigid 3D surface registration.
arXiv Detail & Related papers (2023-04-02T18:01:51Z) - Learning Common Rationale to Improve Self-Supervised Representation for
Fine-Grained Visual Recognition Problems [61.11799513362704]
We propose learning an additional screening mechanism to identify discriminative clues commonly seen across instances and classes.
We show that a common rationale detector can be learned by simply exploiting the GradCAM induced from the SSL objective.
arXiv Detail & Related papers (2023-03-03T02:07:40Z) - Paint and Distill: Boosting 3D Object Detection with Semantic Passing
Network [70.53093934205057]
3D object detection task from lidar or camera sensors is essential for autonomous driving.
We propose a novel semantic passing framework, named SPNet, to boost the performance of existing lidar-based 3D detection models.
arXiv Detail & Related papers (2022-07-12T12:35:34Z) - Continual Object Detection via Prototypical Task Correlation Guided
Gating Mechanism [120.1998866178014]
We present a flexible framework for continual object detection via pRotOtypical taSk corrElaTion guided gaTingAnism (ROSETTA)
Concretely, a unified framework is shared by all tasks while task-aware gates are introduced to automatically select sub-models for specific tasks.
Experiments on COCO-VOC, KITTI-Kitchen, class-incremental detection on VOC and sequential learning of four tasks show that ROSETTA yields state-of-the-art performance.
arXiv Detail & Related papers (2022-05-06T07:31:28Z) - Self-Supervised Equivariant Learning for Oriented Keypoint Detection [35.94215211409985]
We introduce a self-supervised learning framework using rotation-equivariant CNNs to learn to detect robust oriented keypoints.
We propose a dense orientation alignment loss by an image pair generated by synthetic transformations for training a histogram-based orientation map.
Our method outperforms the previous methods on an image matching benchmark and a camera pose estimation benchmark.
arXiv Detail & Related papers (2022-04-19T02:26:07Z) - Unsupervised Part Discovery from Contrastive Reconstruction [90.88501867321573]
The goal of self-supervised visual representation learning is to learn strong, transferable image representations.
We propose an unsupervised approach to object part discovery and segmentation.
Our method yields semantic parts consistent across fine-grained but visually distinct categories.
arXiv Detail & Related papers (2021-11-11T17:59:42Z) - PP-ShiTu: A Practical Lightweight Image Recognition System [5.400569330093269]
We propose a practical lightweight image recognition system, named PP-ShiTu, consisting of the following 3 modules.
We introduce popular strategies including metric learning, deep hash, knowledge distillation and model quantization to improve accuracy and inference speed.
Experiments on different datasets and benchmarks show that the system is widely effective in different domains of image recognition.
arXiv Detail & Related papers (2021-11-01T09:04:54Z) - Towards High Performance Human Keypoint Detection [87.1034745775229]
We find that context information plays an important role in reasoning human body configuration and invisible keypoints.
Inspired by this, we propose a cascaded context mixer ( CCM) which efficiently integrates spatial and channel context information.
To maximize CCM's representation capability, we develop a hard-negative person detection mining strategy and a joint-training strategy.
We present several sub-pixel refinement techniques for postprocessing keypoint predictions to improve detection accuracy.
arXiv Detail & Related papers (2020-02-03T02:24:51Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.