Scene-Aware Feature Matching
- URL: http://arxiv.org/abs/2308.09949v2
- Date: Tue, 22 Aug 2023 01:21:48 GMT
- Title: Scene-Aware Feature Matching
- Authors: Xiaoyong Lu, Yaping Yan, Tong Wei, Songlin Du
- Abstract summary: We propose a novel model named SAM, which applies attentional grouping to guide Scene-Aware feature Matching.
With the sense-aware grouping guidance, SAM is not only more accurate and robust but also more interpretable than conventional feature matching models.
- Score: 13.014369025829598
- License: http://creativecommons.org/licenses/by-nc-sa/4.0/
- Abstract: Current feature matching methods focus on point-level matching, pursuing
better representation learning of individual features, but lacking further
understanding of the scene. This results in significant performance degradation
when handling challenging scenes such as scenes with large viewpoint and
illumination changes. To tackle this problem, we propose a novel model named
SAM, which applies attentional grouping to guide Scene-Aware feature Matching.
SAM handles multi-level features, i.e., image tokens and group tokens, with
attention layers, and groups the image tokens with the proposed token grouping
module. Our model can be trained by ground-truth matches only and produce
reasonable grouping results. With the sense-aware grouping guidance, SAM is not
only more accurate and robust but also more interpretable than conventional
feature matching models. Sufficient experiments on various applications,
including homography estimation, pose estimation, and image matching,
demonstrate that our model achieves state-of-the-art performance.
Related papers
- Adaptive Prompt Learning with SAM for Few-shot Scanning Probe Microscope Image Segmentation [11.882111844381098]
Segment Anything Model (SAM) has demonstrated strong performance in image segmentation of natural scene images.
SAM's effectiveness diminishes markedly when applied to specific scientific domains, such as Scanning Probe Microscope (SPM) images.
We propose an Adaptive Prompt Learning with SAM framework tailored for few-shot SPM image segmentation.
arXiv Detail & Related papers (2024-10-16T13:38:01Z) - Intra-task Mutual Attention based Vision Transformer for Few-Shot Learning [12.5354658533836]
Humans possess remarkable ability to accurately classify new, unseen images after being exposed to only a few examples.
For artificial neural network models, determining the most relevant features for distinguishing between two images with limited samples presents a challenge.
We propose an intra-task mutual attention method for few-shot learning, that involves splitting the support and query samples into patches.
arXiv Detail & Related papers (2024-05-06T02:02:57Z) - RefSAM: Efficiently Adapting Segmenting Anything Model for Referring Video Object Segmentation [53.4319652364256]
This paper presents the RefSAM model, which explores the potential of SAM for referring video object segmentation.
Our proposed approach adapts the original SAM model to enhance cross-modality learning by employing a lightweight Cross-RValModal.
We employ a parameter-efficient tuning strategy to align and fuse the language and vision features effectively.
arXiv Detail & Related papers (2023-07-03T13:21:58Z) - Weakly-Supervised Concealed Object Segmentation with SAM-based Pseudo
Labeling and Multi-scale Feature Grouping [40.07070188661184]
Weakly-Supervised Concealed Object (WSCOS) aims to segment objects well blended with surrounding environments.
It is hard to distinguish concealed objects from the background due to the intrinsic similarity.
We propose a new WSCOS method to address these two challenges.
arXiv Detail & Related papers (2023-05-18T14:31:34Z) - LEAD: Self-Supervised Landmark Estimation by Aligning Distributions of
Feature Similarity [49.84167231111667]
Existing works in self-supervised landmark detection are based on learning dense (pixel-level) feature representations from an image.
We introduce an approach to enhance the learning of dense equivariant representations in a self-supervised fashion.
We show that having such a prior in the feature extractor helps in landmark detection, even under drastically limited number of annotations.
arXiv Detail & Related papers (2022-04-06T17:48:18Z) - Learning Contrastive Representation for Semantic Correspondence [150.29135856909477]
We propose a multi-level contrastive learning approach for semantic matching.
We show that image-level contrastive learning is a key component to encourage the convolutional features to find correspondence between similar objects.
arXiv Detail & Related papers (2021-09-22T18:34:14Z) - Learning to Compose Hypercolumns for Visual Correspondence [57.93635236871264]
We introduce a novel approach to visual correspondence that dynamically composes effective features by leveraging relevant layers conditioned on the images to match.
The proposed method, dubbed Dynamic Hyperpixel Flow, learns to compose hypercolumn features on the fly by selecting a small number of relevant layers from a deep convolutional neural network.
arXiv Detail & Related papers (2020-07-21T04:03:22Z) - Co-Attention for Conditioned Image Matching [91.43244337264454]
We propose a new approach to determine correspondences between image pairs in the wild under large changes in illumination, viewpoint, context, and material.
While other approaches find correspondences between pairs of images by treating the images independently, we instead condition on both images to implicitly take account of the differences between them.
arXiv Detail & Related papers (2020-07-16T17:32:00Z) - Improving Few-shot Learning by Spatially-aware Matching and
CrossTransformer [116.46533207849619]
We study the impact of scale and location mismatch in the few-shot learning scenario.
We propose a novel Spatially-aware Matching scheme to effectively perform matching across multiple scales and locations.
arXiv Detail & Related papers (2020-01-06T14:10:20Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.