Related papers: Focus What Matters: Matchability-Based Reweighting for Local Feature Matching

Focus What Matters: Matchability-Based Reweighting for Local Feature Matching

URL: http://arxiv.org/abs/2505.02161v1
Date: Sun, 04 May 2025 15:50:28 GMT
Title: Focus What Matters: Matchability-Based Reweighting for Local Feature Matching
Authors: Dongyue Li,
Abstract summary: We propose a novel attention reweighting mechanism that simultaneously incorporates a learnable bias term into the attention logits.<n>Experiments conducted on three benchmark datasets validate the effectiveness of our method.
Score: 6.361840891399624
License: http://creativecommons.org/publicdomain/zero/1.0/
Abstract: Since the rise of Transformers, many semi-dense matching methods have adopted attention mechanisms to extract feature descriptors. However, the attention weights, which capture dependencies between pixels or keypoints, are often learned from scratch. This approach can introduce redundancy and noisy interactions from irrelevant regions, as it treats all pixels or keypoints equally. Drawing inspiration from keypoint selection processes, we propose to first classify all pixels into two categories: matchable and non-matchable. Matchable pixels are expected to receive higher attention weights, while non-matchable ones are down-weighted. In this work, we propose a novel attention reweighting mechanism that simultaneously incorporates a learnable bias term into the attention logits and applies a matchability-informed rescaling to the input value features. The bias term, injected prior to the softmax operation, selectively adjusts attention scores based on the confidence of query-key interactions. Concurrently, the feature rescaling acts post-attention by modulating the influence of each value vector in the final output. This dual design allows the attention mechanism to dynamically adjust both its internal weighting scheme and the magnitude of its output representations. Extensive experiments conducted on three benchmark datasets validate the effectiveness of our method, consistently outperforming existing state-of-the-art approaches.

Related papers

Efficient Leaf Disease Classification and Segmentation using Midpoint Normalization Technique and Attention Mechanism [0.0]
We introduce a transformative two-stage methodology, Mid Point Normalization (MPN) for intelligent image preprocessing.<n>Our classification pipeline achieves 93% accuracy while maintaining exceptional class-wise balance.<n>For segmentation tasks, we seamlessly integrate identical attention blocks within U-Net architecture using MPN-enhanced inputs.
arXiv Detail & Related papers (2025-05-27T15:14:04Z)
CoMatch: Dynamic Covisibility-Aware Transformer for Bilateral Subpixel-Level Semi-Dense Image Matching [31.42896369011162]
CoMatch is a novel semi-dense image matcher with dynamic covisibility awareness and bilateral subpixel accuracy.<n>A covisibility-guided token condenser is introduced to adaptively aggregate tokens in light of their covisibility scores.<n>A fine correlation module is developed to refine the matching candidates in both source and target views to subpixel level.
arXiv Detail & Related papers (2025-03-31T10:17:01Z)
DAPE V2: Process Attention Score as Feature Map for Length Extrapolation [63.87956583202729]
We conceptualize attention as a feature map and apply the convolution operator to mimic the processing methods in computer vision. The novel insight, which can be adapted to various attention-related models, reveals that the current Transformer architecture has the potential for further evolution.
arXiv Detail & Related papers (2024-10-07T07:21:49Z)
Unifying Feature and Cost Aggregation with Transformers for Semantic and Visual Correspondence [51.54175067684008]
This paper introduces a Transformer-based integrative feature and cost aggregation network designed for dense matching tasks. We first show that feature aggregation and cost aggregation exhibit distinct characteristics and reveal the potential for substantial benefits stemming from the judicious use of both aggregation processes. Our framework is evaluated on standard benchmarks for semantic matching, and also applied to geometric matching, where we show that our approach achieves significant improvements compared to existing methods.
arXiv Detail & Related papers (2024-03-17T07:02:55Z)
ResMatch: Residual Attention Learning for Local Feature Matching [51.07496081296863]
We rethink cross- and self-attention from the viewpoint of traditional feature matching and filtering. We inject the similarity of descriptors and relative positions into cross- and self-attention score. We mine intra- and inter-neighbors according to the similarity of descriptors and relative positions.
arXiv Detail & Related papers (2023-07-11T11:32:12Z)
Learning Feature Matching via Matchable Keypoint-Assisted Graph Neural Network [52.29330138835208]
Accurately matching local features between a pair of images is a challenging computer vision task. Previous studies typically use attention based graph neural networks (GNNs) with fully-connected graphs over keypoints within/across images. We propose MaKeGNN, a sparse attention-based GNN architecture which bypasses non-repeatable keypoints and leverages matchable ones to guide message passing.
arXiv Detail & Related papers (2023-07-04T02:50:44Z)
Rethinking Query-Key Pairwise Interactions in Vision Transformers [5.141895475956681]
We propose key-only attention, which excludes query-key pairwise interactions and uses a compute-efficient saliency-gate to obtain attention weights. We develop a new self-attention model family, LinGlos, which reach state-of-the-art accuracies on the parameter-limited setting of ImageNet classification benchmark.
arXiv Detail & Related papers (2022-07-01T03:36:49Z)
Beyond Self-attention: External Attention using Two Linear Layers for Visual Tasks [34.32609892928909]
We propose a novel attention mechanism which we call external attention, based on two external, small, learnable, and shared memories. Our method provides comparable or superior performance to the self-attention mechanism and some of its variants, with much lower computational and memory costs.
arXiv Detail & Related papers (2021-05-05T22:29:52Z)
Generalizing Face Forgery Detection with High-frequency Features [63.33397573649408]
Current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize. We propose to utilize the high-frequency noises for face forgery detection. The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales. The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective.
arXiv Detail & Related papers (2021-03-23T08:19:21Z)
Pose-guided Visible Part Matching for Occluded Person ReID [80.81748252960843]
We propose a Pose-guided Visible Part Matching (PVPM) method that jointly learns the discriminative features with pose-guided attention and self-mines the part visibility. Experimental results on three reported occluded benchmarks show that the proposed method achieves competitive performance to state-of-the-art methods.
arXiv Detail & Related papers (2020-04-01T04:36:51Z)

This list is automatically generated from the titles and abstracts of the papers in this site.