AMatFormer: Efficient Feature Matching via Anchor Matching Transformer
- URL: http://arxiv.org/abs/2305.19205v1
- Date: Tue, 30 May 2023 16:54:10 GMT
- Title: AMatFormer: Efficient Feature Matching via Anchor Matching Transformer
- Authors: Bo Jiang, Shuxian Luo, Xiao Wang, Chuanfu Li, Jin Tang
- Abstract summary: We propose a novel efficient Anchor Matching Transformer (AMatFormer) for the feature matching problem.
AMatFormer has two main aspects: First, it mainly conducts self-/cross-attention on some anchor features and leverages these anchor features as message bottleneck.
Experiments on several benchmarks demonstrate the effectiveness and efficiency of the proposed AMatFormer matching approach.
- Score: 34.08771267009364
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Learning based feature matching methods have been commonly studied in recent
years. The core issue for learning feature matching is to how to learn (1)
discriminative representations for feature points (or regions) within each
intra-image and (2) consensus representations for feature points across
inter-images. Recently, self- and cross-attention models have been exploited to
address this issue. However, in many scenes, features are coming with
large-scale, redundant and outliers contaminated. Previous
self-/cross-attention models generally conduct message passing on all primal
features which thus lead to redundant learning and high computational cost. To
mitigate limitations, inspired by recent seed matching methods, in this paper,
we propose a novel efficient Anchor Matching Transformer (AMatFormer) for the
feature matching problem. AMatFormer has two main aspects: First, it mainly
conducts self-/cross-attention on some anchor features and leverages these
anchor features as message bottleneck to learn the representations for all
primal features. Thus, it can be implemented efficiently and compactly. Second,
AMatFormer adopts a shared FFN module to further embed the features of two
images into the common domain and thus learn the consensus feature
representations for the matching problem. Experiments on several benchmarks
demonstrate the effectiveness and efficiency of the proposed AMatFormer
matching approach.
Related papers
- PRISM: PRogressive dependency maxImization for Scale-invariant image Matching [4.9521269535586185]
We propose PRogressive dependency maxImization for Scale-invariant image Matching (PRISM)
Our method's superior matching performance and generalization capability are confirmed by leading accuracy across various evaluation benchmarks and downstream tasks.
arXiv Detail & Related papers (2024-08-07T07:35:17Z) - Dual Feature Augmentation Network for Generalized Zero-shot Learning [14.410978100610489]
Zero-shot learning (ZSL) aims to infer novel classes without training samples by transferring knowledge from seen classes.
Existing embedding-based approaches for ZSL typically employ attention mechanisms to locate attributes on an image.
We propose a novel Dual Feature Augmentation Network (DFAN), which comprises two feature augmentation modules.
arXiv Detail & Related papers (2023-09-25T02:37:52Z) - Exploring Fine-Grained Representation and Recomposition for Cloth-Changing Person Re-Identification [78.52704557647438]
We propose a novel FIne-grained Representation and Recomposition (FIRe$2$) framework to tackle both limitations without any auxiliary annotation or data.
Experiments demonstrate that FIRe$2$ can achieve state-of-the-art performance on five widely-used cloth-changing person Re-ID benchmarks.
arXiv Detail & Related papers (2023-08-21T12:59:48Z) - DOAD: Decoupled One Stage Action Detection Network [77.14883592642782]
Localizing people and recognizing their actions from videos is a challenging task towards high-level video understanding.
Existing methods are mostly two-stage based, with one stage for person bounding box generation and the other stage for action recognition.
We present a decoupled one-stage network dubbed DOAD, to improve the efficiency for-temporal action detection.
arXiv Detail & Related papers (2023-04-01T08:06:43Z) - HyRSM++: Hybrid Relation Guided Temporal Set Matching for Few-shot
Action Recognition [51.2715005161475]
We propose a novel Hybrid Relation guided temporal Set Matching approach for few-shot action recognition.
The core idea of HyRSM++ is to integrate all videos within the task to learn discriminative representations.
We show that our method achieves state-of-the-art performance under various few-shot settings.
arXiv Detail & Related papers (2023-01-09T13:32:50Z) - CAD: Co-Adapting Discriminative Features for Improved Few-Shot
Classification [11.894289991529496]
Few-shot classification is a challenging problem that aims to learn a model that can adapt to unseen classes given a few labeled samples.
Recent approaches pre-train a feature extractor, and then fine-tune for episodic meta-learning.
We propose a strategy to cross-attend and re-weight discriminative features for few-shot classification.
arXiv Detail & Related papers (2022-03-25T06:14:51Z) - APANet: Adaptive Prototypes Alignment Network for Few-Shot Semantic
Segmentation [56.387647750094466]
Few-shot semantic segmentation aims to segment novel-class objects in a given query image with only a few labeled support images.
Most advanced solutions exploit a metric learning framework that performs segmentation through matching each query feature to a learned class-specific prototype.
We present an adaptive prototype representation by introducing class-specific and class-agnostic prototypes.
arXiv Detail & Related papers (2021-11-24T04:38:37Z) - Learning to Focus: Cascaded Feature Matching Network for Few-shot Image
Recognition [38.49419948988415]
Deep networks can learn to accurately recognize objects of a category by training on a large number of images.
A meta-learning challenge known as a low-shot image recognition task comes when only a few images with annotations are available for learning a recognition model for one category.
Our method, called Cascaded Feature Matching Network (CFMN), is proposed to solve this problem.
Experiments for few-shot learning on two standard datasets, emphminiImageNet and Omniglot, have confirmed the effectiveness of our method.
arXiv Detail & Related papers (2021-01-13T11:37:28Z) - Symbiotic Adversarial Learning for Attribute-based Person Search [86.7506832053208]
We present a symbiotic adversarial learning framework, called SAL.Two GANs sit at the base of the framework in a symbiotic learning scheme.
Specifically, two different types of generative adversarial networks learn collaboratively throughout the training process.
arXiv Detail & Related papers (2020-07-19T07:24:45Z) - Multi-scale Interactive Network for Salient Object Detection [91.43066633305662]
We propose the aggregate interaction modules to integrate the features from adjacent levels.
To obtain more efficient multi-scale features, the self-interaction modules are embedded in each decoder unit.
Experimental results on five benchmark datasets demonstrate that the proposed method without any post-processing performs favorably against 23 state-of-the-art approaches.
arXiv Detail & Related papers (2020-07-17T15:41:37Z) - Unsupervised segmentation via semantic-apparent feature fusion [21.75371777263847]
This research proposes an unsupervised foreground segmentation method based on semantic-apparent feature fusion (SAFF)
Key regions of foreground object can be accurately responded via semantic features, while apparent features provide richer detailed expression.
By fusing semantic and apparent features, as well as cascading the modules of intra-image adaptive feature weight learning and inter-image common feature learning, the research achieves performance that significantly exceeds baselines.
arXiv Detail & Related papers (2020-05-21T08:28:49Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.