Related papers: Semantic-aware Representation Learning for Homography Estimation

Semantic-aware Representation Learning for Homography Estimation

URL: http://arxiv.org/abs/2407.13284v4
Date: Sat, 12 Oct 2024 08:17:41 GMT
Title: Semantic-aware Representation Learning for Homography Estimation
Authors: Yuhan Liu, Qianxin Huang, Siqi Hui, Jingwen Fu, Sanping Zhou, Kangyi Wu, Pengna Li, Jinjun Wang,
Abstract summary: We propose SRMatcher, a detector-free feature matching method, which encourages the network to learn integrated semantic feature representation. By reducing errors stemming from semantic inconsistencies in matching pairs, our proposed SRMatcher is able to deliver more accurate and realistic outcomes.
Score: 28.70450397793246
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Homography estimation is the task of determining the transformation from an image pair. Our approach focuses on employing detector-free feature matching methods to address this issue. Previous work has underscored the importance of incorporating semantic information, however there still lacks an efficient way to utilize semantic information. Previous methods suffer from treating the semantics as a pre-processing, causing the utilization of semantics overly coarse-grained and lack adaptability when dealing with different tasks. In our work, we seek another way to use the semantic information, that is semantic-aware feature representation learning framework.Based on this, we propose SRMatcher, a new detector-free feature matching method, which encourages the network to learn integrated semantic feature representation.Specifically, to capture precise and rich semantics, we leverage the capabilities of recently popularized vision foundation models (VFMs) trained on extensive datasets. Then, a cross-images Semantic-aware Fusion Block (SFB) is proposed to integrate its fine-grained semantic features into the feature representation space. In this way, by reducing errors stemming from semantic inconsistencies in matching pairs, our proposed SRMatcher is able to deliver more accurate and realistic outcomes. Extensive experiments show that SRMatcher surpasses solid baselines and attains SOTA results on multiple real-world datasets. Compared to the previous SOTA approach GeoFormer, SRMatcher increases the area under the cumulative curve (AUC) by about 11% on HPatches. Additionally, the SRMatcher could serve as a plug-and-play framework for other matching methods like LoFTR, yielding substantial precision improvement.

Related papers

Ambiguity-Aware and High-Order Relation Learning for Multi-Grained Image-Text Matching [6.633576185707164]
This paper proposes the Ambiguity-Aware and High-order Relation learning framework (AAHR) to address these issues.<n>The framework introduces global and local feature extraction mechanisms and an adaptive aggregation network, significantly enhancing full-grained semantic understanding capabilities.<n> Experimental results demonstrate that AAHR outperforms existing state-of-the-art methods on Flickr30K, MSCOCO, and ECCV Caption datasets.
arXiv Detail & Related papers (2025-07-12T11:30:32Z)
Semantic-Spatial Feature Fusion with Dynamic Graph Refinement for Remote Sensing Image Captioning [11.015244501780078]
This paper presents a semantic-spatial feature fusion with dynamic graph refinement (SFDR) method. The proposed SFDR method significantly enhances the quality of the generated descriptions. Experimental results on three benchmark datasets demonstrate the effectiveness of the proposed method.
arXiv Detail & Related papers (2025-03-30T14:14:41Z)
CLIP-Clique: Graph-based Correspondence Matching Augmented by Vision Language Models for Object-based Global Localization [0.0]
One of the most promising approaches for localization on object maps is to use semantic graph matching. To address the former issue, we augment the correspondence matching using Vision Language Models. In addition, inliers are estimated deterministically using a graph-theoretic approach.
arXiv Detail & Related papers (2024-10-04T00:23:20Z)
Sharing Key Semantics in Transformer Makes Efficient Image Restoration [148.22790334216117]
Self-attention mechanism, a cornerstone of Vision Transformers (ViTs) tends to encompass all global cues. Small segments of a degraded image, particularly those closely aligned semantically, provide particularly relevant information to aid in the restoration process. We propose boosting IR's performance by sharing the key semantics via Transformer for IR (ie, SemanIR) in this paper.
arXiv Detail & Related papers (2024-05-30T12:45:34Z)
Spatial Semantic Recurrent Mining for Referring Image Segmentation [63.34997546393106]
We propose Stextsuperscript2RM to achieve high-quality cross-modality fusion. It follows a working strategy of trilogy: distributing language feature, spatial semantic recurrent coparsing, and parsed-semantic balancing. Our proposed method performs favorably against other state-of-the-art algorithms.
arXiv Detail & Related papers (2024-05-15T00:17:48Z)
SEER-ZSL: Semantic Encoder-Enhanced Representations for Generalized Zero-Shot Learning [0.6792605600335813]
Zero-Shot Learning (ZSL) presents the challenge of identifying categories not seen during training. We introduce a Semantic-Enhanced Representations for Zero-Shot Learning (SEER-ZSL) First, we aim to distill meaningful semantic information using a probabilistic encoder, enhancing the semantic consistency and robustness. Second, we distill the visual space by exploiting the learned data distribution through an adversarially trained generator. Third, we align the distilled information, enabling a mapping of unseen categories onto the true data manifold.
arXiv Detail & Related papers (2023-12-20T15:18:51Z)
Beyond Prototypes: Semantic Anchor Regularization for Better Representation Learning [82.29761875805369]
One of the ultimate goals of representation learning is to achieve compactness within a class and well-separability between classes. We propose a novel perspective to use pre-defined class anchors serving as feature centroid to unidirectionally guide feature learning. The proposed Semantic Anchor Regularization (SAR) can be used in a plug-and-play manner in the existing models.
arXiv Detail & Related papers (2023-12-19T05:52:38Z)
FECANet: Boosting Few-Shot Semantic Segmentation with Feature-Enhanced Context-Aware Network [48.912196729711624]
Few-shot semantic segmentation is the task of learning to locate each pixel of a novel class in a query image with only a few annotated support images. We propose a Feature-Enhanced Context-Aware Network (FECANet) to suppress the matching noise caused by inter-class local similarity. In addition, we propose a novel correlation reconstruction module that encodes extra correspondence relations between foreground and background and multi-scale context semantic features.
arXiv Detail & Related papers (2023-01-19T16:31:13Z)
Scale-Semantic Joint Decoupling Network for Image-text Retrieval in Remote Sensing [23.598273691455503]
We propose a novel Scale-Semantic Joint Decoupling Network (SJDN) for remote sensing image-text retrieval. Our proposed SSJDN outperforms state-of-the-art approaches in numerical experiments conducted on four benchmark remote sensing datasets.
arXiv Detail & Related papers (2022-12-12T08:02:35Z)
Semantic SuperPoint: A Deep Semantic Descriptor [2.1362576987263955]
We propose that adding a semantic segmentation decoder in a shared encoder architecture would help the descriptor decoder learn semantic information. The proposed models are evaluated according to detection and matching metrics on the HPatches dataset.
arXiv Detail & Related papers (2022-11-02T13:17:04Z)
Hybrid Routing Transformer for Zero-Shot Learning [83.64532548391]
This paper presents a novel transformer encoder-decoder model, called hybrid routing transformer (HRT) We embed an active attention, which is constructed by both the bottom-up and the top-down dynamic routing pathways to generate the attribute-aligned visual feature. While in HRT decoder, we use static routing to calculate the correlation among the attribute-aligned visual features, the corresponding attribute semantics, and the class attribute vectors to generate the final class label predictions.
arXiv Detail & Related papers (2022-03-29T07:55:08Z)
Group-Wise Semantic Mining for Weakly Supervised Semantic Segmentation [49.90178055521207]
This work addresses weakly supervised semantic segmentation (WSSS), with the goal of bridging the gap between image-level annotations and pixel-level segmentation. We formulate WSSS as a novel group-wise learning task that explicitly models semantic dependencies in a group of images to estimate more reliable pseudo ground-truths. In particular, we devise a graph neural network (GNN) for group-wise semantic mining, wherein input images are represented as graph nodes.
arXiv Detail & Related papers (2020-12-09T12:40:13Z)
Unsupervised segmentation via semantic-apparent feature fusion [21.75371777263847]
This research proposes an unsupervised foreground segmentation method based on semantic-apparent feature fusion (SAFF) Key regions of foreground object can be accurately responded via semantic features, while apparent features provide richer detailed expression. By fusing semantic and apparent features, as well as cascading the modules of intra-image adaptive feature weight learning and inter-image common feature learning, the research achieves performance that significantly exceeds baselines.
arXiv Detail & Related papers (2020-05-21T08:28:49Z)
High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment. Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.