DeDoDe: Detect, Don't Describe -- Describe, Don't Detect for Local
Feature Matching
- URL: http://arxiv.org/abs/2308.08479v3
- Date: Mon, 11 Dec 2023 14:16:50 GMT
- Title: DeDoDe: Detect, Don't Describe -- Describe, Don't Detect for Local
Feature Matching
- Authors: Johan Edstedt, Georg B\"okman, M{\aa}rten Wadenb\"ack, Michael
Felsberg
- Abstract summary: Keypoint detection is a pivotal step in 3D reconstruction, whereby sets of (up to) K points are detected in each view of a scene.
Previous learning-based methods typically learn descriptors with keypoints, and treat the keypoint detection as a binary classification task on mutual nearest neighbours.
In this work, we learn keypoints directly from 3D consistency. To this end, we derive a semi-supervised two-view detection objective to expand this set to a desired number of detections.
Results show that our approach, DeDoDe, achieves significant gains on multiple geometry benchmarks.
- Score: 14.837075102089
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Keypoint detection is a pivotal step in 3D reconstruction, whereby sets of
(up to) K points are detected in each view of a scene. Crucially, the detected
points need to be consistent between views, i.e., correspond to the same 3D
point in the scene. One of the main challenges with keypoint detection is the
formulation of the learning objective. Previous learning-based methods
typically jointly learn descriptors with keypoints, and treat the keypoint
detection as a binary classification task on mutual nearest neighbours.
However, basing keypoint detection on descriptor nearest neighbours is a proxy
task, which is not guaranteed to produce 3D-consistent keypoints. Furthermore,
this ties the keypoints to a specific descriptor, complicating downstream
usage. In this work, we instead learn keypoints directly from 3D consistency.
To this end, we train the detector to detect tracks from large-scale SfM. As
these points are often overly sparse, we derive a semi-supervised two-view
detection objective to expand this set to a desired number of detections. To
train a descriptor, we maximize the mutual nearest neighbour objective over the
keypoints with a separate network. Results show that our approach, DeDoDe,
achieves significant gains on multiple geometry benchmarks. Code is provided at
https://github.com/Parskatt/DeDoDe
Related papers
- SC3K: Self-supervised and Coherent 3D Keypoints Estimation from Rotated,
Noisy, and Decimated Point Cloud Data [17.471342278936365]
We propose a new method to infer keypoints from arbitrary object categories in practical scenarios where point cloud data (PCD) are noisy, down-sampled and arbitrarily rotated.
We achieve these desiderata by proposing a new self-supervised training strategy for keypoints estimation.
We compare the keypoints estimated by the proposed approach with those of the state-of-the-art unsupervised approaches.
arXiv Detail & Related papers (2023-08-10T08:10:01Z) - V-DETR: DETR with Vertex Relative Position Encoding for 3D Object
Detection [73.37781484123536]
We introduce a highly performant 3D object detector for point clouds using the DETR framework.
To address the limitation, we introduce a novel 3D Relative Position (3DV-RPE) method.
We show exceptional results on the challenging ScanNetV2 benchmark.
arXiv Detail & Related papers (2023-08-08T17:14:14Z) - TUSK: Task-Agnostic Unsupervised Keypoints [21.777256048659165]
We propose a novel method to learn Task-agnostic, UnSupervised Keypoints (TUSK) which can deal with multiple instances.
Specifically, we encode semantics into the keypoints by teaching them to reconstruct images from a sparse set of keypoints and their descriptors.
This makes our approach amenable to a wider range of tasks than any previous unsupervised keypoint method.
arXiv Detail & Related papers (2022-06-16T21:56:17Z) - CenterNet++ for Object Detection [174.59360147041673]
Bottom-up approaches are as competitive as the top-down and enjoy higher recall.
Our approach, named CenterNet, detects each object as a triplet keypoints (top-left and bottom-right corners and the center keypoint)
On the MS-COCO dataset, CenterNet with Res2Net-101 and Swin-Transformer achieves APs of 53.7% and 57.1%, respectively.
arXiv Detail & Related papers (2022-04-18T16:45:53Z) - SASA: Semantics-Augmented Set Abstraction for Point-based 3D Object
Detection [78.90102636266276]
We propose a novel set abstraction method named Semantics-Augmented Set Abstraction (SASA)
Based on the estimated point-wise foreground scores, we then propose a semantics-guided point sampling algorithm to help retain more important foreground points during down-sampling.
In practice, SASA shows to be effective in identifying valuable points related to foreground objects and improving feature learning for point-based 3D detection.
arXiv Detail & Related papers (2022-01-06T08:54:47Z) - Soft Expectation and Deep Maximization for Image Feature Detection [68.8204255655161]
We propose SEDM, an iterative semi-supervised learning process that flips the question and first looks for repeatable 3D points, then trains a detector to localize them in image space.
Our results show that this new model trained using SEDM is able to better localize the underlying 3D points in a scene.
arXiv Detail & Related papers (2021-04-21T00:35:32Z) - Skeleton Merger: an Unsupervised Aligned Keypoint Detector [44.983569951041]
Skeleton Merger is an unsupervised aligned keypoint detector based on an Autoencoder architecture.
It is capable of detecting semantically-rich salient keypoints with good alignment and shows comparable performance to supervised methods on the KeypointNet dataset.
arXiv Detail & Related papers (2021-03-19T14:00:39Z) - UKPGAN: A General Self-Supervised Keypoint Detector [43.35270822722044]
UKPGAN is a general self-supervised 3D keypoint detector.
Our keypoints align well with human annotated keypoint labels.
Our model is stable under both rigid and non-rigid transformations.
arXiv Detail & Related papers (2020-11-24T09:08:21Z) - KeypointNet: A Large-scale 3D Keypoint Dataset Aggregated from Numerous
Human Annotations [56.34297279246823]
KeypointNet is the first large-scale and diverse 3D keypoint dataset.
It contains 103,450 keypoints and 8,234 3D models from 16 object categories.
Ten state-of-the-art methods are benchmarked on our proposed dataset.
arXiv Detail & Related papers (2020-02-28T12:58:56Z) - PPDM: Parallel Point Detection and Matching for Real-time Human-Object
Interaction Detection [85.75935399090379]
We propose a single-stage Human-Object Interaction (HOI) detection method that has outperformed all existing methods on HICO-DET dataset at 37 fps.
It is the first real-time HOI detection method.
arXiv Detail & Related papers (2019-12-30T12:00:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.