Enhancing Deformable Local Features by Jointly Learning to Detect and
Describe Keypoints
- URL: http://arxiv.org/abs/2304.00583v1
- Date: Sun, 2 Apr 2023 18:01:51 GMT
- Title: Enhancing Deformable Local Features by Jointly Learning to Detect and
Describe Keypoints
- Authors: Guilherme Potje, Felipe Cadar, Andre Araujo, Renato Martins, Erickson
R. Nascimento
- Abstract summary: Local feature extraction is a standard approach in computer vision for tackling important tasks such as image matching and retrieval.
We propose DALF, a novel deformation-aware network for jointly detecting and describing keypoints.
Our approach also enhances the performance of two real-world applications: deformable object retrieval and non-rigid 3D surface registration.
- Score: 8.390939268280235
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Local feature extraction is a standard approach in computer vision for
tackling important tasks such as image matching and retrieval. The core
assumption of most methods is that images undergo affine transformations,
disregarding more complicated effects such as non-rigid deformations.
Furthermore, incipient works tailored for non-rigid correspondence still rely
on keypoint detectors designed for rigid transformations, hindering performance
due to the limitations of the detector. We propose DALF (Deformation-Aware
Local Features), a novel deformation-aware network for jointly detecting and
describing keypoints, to handle the challenging problem of matching deformable
surfaces. All network components work cooperatively through a feature fusion
approach that enforces the descriptors' distinctiveness and invariance.
Experiments using real deforming objects showcase the superiority of our
method, where it delivers 8% improvement in matching scores compared to the
previous best results. Our approach also enhances the performance of two
real-world applications: deformable object retrieval and non-rigid 3D surface
registration. Code for training, inference, and applications are publicly
available at https://verlab.dcc.ufmg.br/descriptors/dalf_cvpr23.
Related papers
- RADA: Robust and Accurate Feature Learning with Domain Adaptation [7.905594146253435]
We introduce a multi-level feature aggregation network that incorporates two pivotal components to facilitate the learning of robust and accurate features.
Our method, RADA, achieves excellent results in image matching, camera pose estimation, and visual localization tasks.
arXiv Detail & Related papers (2024-07-22T16:49:58Z) - RGBD-Glue: General Feature Combination for Robust RGB-D Point Cloud Registration [0.0]
We propose a new feature combination framework, which applies a looser but more effective combination.
An explicit filter based on transformation consistency is designed for the combination framework, which can overcome each feature's weakness.
Experiments on ScanNet and 3DMatch show that our method achieves a state-of-the-art performance.
arXiv Detail & Related papers (2024-05-13T09:56:28Z) - KP-RED: Exploiting Semantic Keypoints for Joint 3D Shape Retrieval and Deformation [87.23575166061413]
KP-RED is a unified KeyPoint-driven REtrieval and Deformation framework.
It takes object scans as input and jointly retrieves and deforms the most geometrically similar CAD models.
arXiv Detail & Related papers (2024-03-15T08:44:56Z) - Learning-based Relational Object Matching Across Views [63.63338392484501]
We propose a learning-based approach which combines local keypoints with novel object-level features for matching object detections between RGB images.
We train our object-level matching features based on appearance and inter-frame and cross-frame spatial relations between objects in an associative graph neural network.
arXiv Detail & Related papers (2023-05-03T19:36:51Z) - Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain.
GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors.
We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z) - Part-guided Relational Transformers for Fine-grained Visual Recognition [59.20531172172135]
We propose a framework to learn the discriminative part features and explore correlations with a feature transformation module.
Our proposed approach does not rely on additional part branches and reaches state-the-of-art performance on 3-of-the-level object recognition.
arXiv Detail & Related papers (2022-12-28T03:45:56Z) - Learning to Detect Good Keypoints to Match Non-Rigid Objects in RGB
Images [7.428474910083337]
We present a novel learned keypoint detection method designed to maximize the number of correct matches for the task of non-rigid image correspondence.
Our training framework uses true correspondences, obtained by matching annotated image pairs with a predefined descriptor extractor, as a ground-truth to train a convolutional neural network (CNN)
Experiments show that our method outperforms the state-of-the-art keypoint detector on real images of non-rigid objects by 20 p.p. on Mean Matching Accuracy.
arXiv Detail & Related papers (2022-12-13T11:59:09Z) - UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision
Transformer for Face Forgery Detection [52.91782218300844]
We propose a novel Unsupervised Inconsistency-Aware method based on Vision Transformer, called UIA-ViT.
Due to the self-attention mechanism, the attention map among patch embeddings naturally represents the consistency relation, making the vision Transformer suitable for the consistency representation learning.
arXiv Detail & Related papers (2022-10-23T15:24:47Z) - RoRD: Rotation-Robust Descriptors and Orthographic Views for Local
Feature Matching [32.10261486751993]
We present a novel framework that combines learning of invariant descriptors through data augmentation and viewpoint projection.
We evaluate the effectiveness of the proposed approach on key tasks including pose estimation and visual place recognition.
arXiv Detail & Related papers (2021-03-15T17:40:25Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - HDD-Net: Hybrid Detector Descriptor with Mutual Interactive Learning [24.13425816781179]
Local feature extraction remains an active research area due to the advances in fields such as SLAM, 3D reconstructions, or AR applications.
We propose a method that treats both extractions independently and focuses on their interaction in the learning process.
We show improvements over the state of the art in terms of image matching on HPatches and 3D reconstruction quality while keeping on par on camera localisation tasks.
arXiv Detail & Related papers (2020-05-12T13:55:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.