DeepMatcher: A Deep Transformer-based Network for Robust and Accurate
Local Feature Matching
- URL: http://arxiv.org/abs/2301.02993v1
- Date: Sun, 8 Jan 2023 07:15:09 GMT
- Title: DeepMatcher: A Deep Transformer-based Network for Robust and Accurate
Local Feature Matching
- Authors: Tao Xie, Kun Dai, Ke Wang, Ruifeng Li, Lijun Zhao
- Abstract summary: We propose a deep Transformer-based network built upon our investigation of local feature matching in detector-free methods.
DeepMatcher captures more human-intuitive and simpler-to-match features.
We show that DeepMatcher significantly outperforms the state-of-the-art methods on several benchmarks.
- Score: 9.662752427139496
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Local feature matching between images remains a challenging task, especially
in the presence of significant appearance variations, e.g., extreme viewpoint
changes. In this work, we propose DeepMatcher, a deep Transformer-based network
built upon our investigation of local feature matching in detector-free
methods. The key insight is that local feature matcher with deep layers can
capture more human-intuitive and simpler-to-match features. Based on this, we
propose a Slimming Transformer (SlimFormer) dedicated for DeepMatcher, which
leverages vector-based attention to model relevance among all keypoints and
achieves long-range context aggregation in an efficient and effective manner. A
relative position encoding is applied to each SlimFormer so as to explicitly
disclose relative distance information, further improving the representation of
keypoints. A layer-scale strategy is also employed in each SlimFormer to enable
the network to assimilate message exchange from the residual block adaptively,
thus allowing it to simulate the human behaviour that humans can acquire
different matching cues each time they scan an image pair. To facilitate a
better adaption of the SlimFormer, we introduce a Feature Transition Module
(FTM) to ensure a smooth transition in feature scopes with different receptive
fields. By interleaving the self- and cross-SlimFormer multiple times,
DeepMatcher can easily establish pixel-wise dense matches at coarse level.
Finally, we perceive the match refinement as a combination of classification
and regression problems and design Fine Matches Module to predict confidence
and offset concurrently, thereby generating robust and accurate matches.
Experimentally, we show that DeepMatcher significantly outperforms the
state-of-the-art methods on several benchmarks, demonstrating the superior
matching capability of DeepMatcher.
Related papers
- GSSF: Generalized Structural Sparse Function for Deep Cross-modal Metric Learning [51.677086019209554]
We propose a Generalized Structural Sparse to capture powerful relationships across modalities for pair-wise similarity learning.
The distance metric delicately encapsulates two formats of diagonal and block-diagonal terms.
Experiments on cross-modal and two extra uni-modal retrieval tasks have validated its superiority and flexibility.
arXiv Detail & Related papers (2024-10-20T03:45:50Z) - Adaptive Spot-Guided Transformer for Consistent Local Feature Matching [64.30749838423922]
We propose Adaptive Spot-Guided Transformer (ASTR) for local feature matching.
ASTR models the local consistency and scale variations in a unified coarse-to-fine architecture.
arXiv Detail & Related papers (2023-03-29T12:28:01Z) - OAMatcher: An Overlapping Areas-based Network for Accurate Local Feature
Matching [9.006654114778073]
We propose OAMatcher, a detector-free method that imitates humans behavior to generate dense and accurate matches.
OAMatcher predicts overlapping areas to promote effective and clean global context aggregation.
Comprehensive experiments demonstrate that OAMatcher outperforms the state-of-the-art methods on several benchmarks.
arXiv Detail & Related papers (2023-02-12T03:32:45Z) - Adaptive Assignment for Geometry Aware Local Feature Matching [22.818457285745733]
detector-free feature matching approaches are currently attracting great attention thanks to their excellent performance.
We introduce AdaMatcher, which accomplishes the feature correlation and co-visible area estimation through an elaborate feature interaction module.
AdaMatcher then performs adaptive assignment on patch-level matching while estimating the scales between images, and finally refines the co-visible matches through scale alignment and sub-pixel regression module.
arXiv Detail & Related papers (2022-07-18T08:22:18Z) - TransforMatcher: Match-to-Match Attention for Semantic Correspondence [48.25709192748133]
We introduce a strong semantic image matching learner, dubbed TransforMatcher, which builds on the success of transformer networks in vision domains.
Unlike existing convolution- or attention-based schemes for correspondence, TransforMatcher performs global match-to-match attention for precise match localization and dynamic refinement.
In experiments, TransforMatcher sets a new state of the art on SPair-71k while performing on par with existing SOTA methods on the PF-PASCAL dataset.
arXiv Detail & Related papers (2022-05-23T21:02:01Z) - REGTR: End-to-end Point Cloud Correspondences with Transformers [79.52112840465558]
We conjecture that attention mechanisms can replace the role of explicit feature matching and RANSAC.
We propose an end-to-end framework to directly predict the final set of correspondences.
Our approach achieves state-of-the-art performance on 3DMatch and ModelNet benchmarks.
arXiv Detail & Related papers (2022-03-28T06:01:00Z) - MatchFormer: Interleaving Attention in Transformers for Feature Matching [31.175513306917654]
We propose a novel hierarchical extract-and-match transformer, termed as MatchFormer.
We interleave self-attention for feature extraction and cross-attention for feature matching, enabling a human-intuitive extract-and-match scheme.
Thanks to such a strategy, MatchFormer is a multi-win solution in efficiency, robustness, and precision.
arXiv Detail & Related papers (2022-03-17T22:49:14Z) - GOCor: Bringing Globally Optimized Correspondence Volumes into Your
Neural Network [176.3781969089004]
Feature correlation layer serves as a key neural network module in computer vision problems that involve dense correspondences between image pairs.
We propose GOCor, a fully differentiable dense matching module, acting as a direct replacement to the feature correlation layer.
Our approach significantly outperforms the feature correlation layer for the tasks of geometric matching, optical flow, and dense semantic matching.
arXiv Detail & Related papers (2020-09-16T17:33:01Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.