MFAF: An EVA02-Based Multi-scale Frequency Attention Fusion Method for Cross-View Geo-Localization
- URL: http://arxiv.org/abs/2509.12673v1
- Date: Tue, 16 Sep 2025 04:51:52 GMT
- Title: MFAF: An EVA02-Based Multi-scale Frequency Attention Fusion Method for Cross-View Geo-Localization
- Authors: YiTong Liu, TianZhu Liu, YanFeng GU,
- Abstract summary: Cross-view geo-localization aims to determine the geographical location of a query image by matching it against a gallery of images.<n>This task is challenging due to the significant appearance variations of objects observed from variable views, along with the difficulty in extracting discriminative features.<n>Existing approaches often rely on extracting features through feature map segmentation while neglecting spatial and semantic information.
- Score: 6.027431240137503
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Cross-view geo-localization aims to determine the geographical location of a query image by matching it against a gallery of images. This task is challenging due to the significant appearance variations of objects observed from variable views, along with the difficulty in extracting discriminative features. Existing approaches often rely on extracting features through feature map segmentation while neglecting spatial and semantic information. To address these issues, we propose the EVA02-based Multi-scale Frequency Attention Fusion (MFAF) method. The MFAF method consists of Multi-Frequency Branch-wise Block (MFB) and the Frequency-aware Spatial Attention (FSA) module. The MFB block effectively captures both low-frequency structural features and high-frequency edge details across multiple scales, improving the consistency and robustness of feature representations across various viewpoints. Meanwhile, the FSA module adaptively focuses on the key regions of frequency features, significantly mitigating the interference caused by background noise and viewpoint variability. Extensive experiments on widely recognized benchmarks, including University-1652, SUES-200, and Dense-UAV, demonstrate that the MFAF method achieves competitive performance in both drone localization and drone navigation tasks.
Related papers
- FOCA: Frequency-Oriented Cross-Domain Forgery Detection, Localization and Explanation via Multi-Modal Large Language Model [11.08248067961235]
FOCA is a large language model-based framework that integrates discriminative features from both the RGB spatial and frequency domains.<n>FSE-Set is a large-scale dataset with diverse authentic and tampered images, pixel-level masks, and dual-domain annotations.
arXiv Detail & Related papers (2026-02-21T15:53:44Z) - NAF: Zero-Shot Feature Upsampling via Neighborhood Attention Filtering [80.55691420311616]
Neighborhood Attention Filtering (NAF) learns adaptive spatial-and-content weights through Cross-Scale Neighborhood Attention and Rotary Position Embeddings (RoPE)<n>NAF operates zero-shot: it upsamples features from any Vision Foundation Models (VFMs) without retraining.<n>It maintains high efficiency, scaling to 2K feature maps and reconstructing intermediate-resolution maps at 18 FPS.
arXiv Detail & Related papers (2025-11-23T13:43:52Z) - Frequency-Domain Decomposition and Recomposition for Robust Audio-Visual Segmentation [60.9960601057956]
We introduce Frequency-Aware Audio-Visualcomposer (FAVS) framework consisting of two key modules.<n>FAVS framework achieves state-of-the-art performance on three benchmark datasets.
arXiv Detail & Related papers (2025-09-23T12:33:48Z) - Wavelet-Guided Dual-Frequency Encoding for Remote Sensing Change Detection [67.84730634802204]
Change detection in remote sensing imagery plays a vital role in various engineering applications, such as natural disaster monitoring, urban expansion tracking, and infrastructure management.<n>Most existing methods still rely on spatial-domain modeling, where the limited diversity of feature representations hinders the detection of subtle change regions.<n>We observe that frequency-domain feature modeling particularly in the wavelet domain amplify fine-grained differences in frequency components, enhancing the perception of edge changes that are challenging to capture in the spatial domain.
arXiv Detail & Related papers (2025-08-07T11:14:16Z) - Generalizable Multispectral Land Cover Classification via Frequency-Aware Mixture of Low-Rank Token Experts [22.75047167955269]
We introduce Land-MoE, a novel approach for multispectral land cover classification (MLCC)<n>Land-MoE comprises two key modules: the mixture of low-rank token experts (MoLTE) and frequency-aware filters (FAF)
arXiv Detail & Related papers (2025-05-20T08:52:28Z) - Adaptive Frequency Enhancement Network for Remote Sensing Image Semantic Segmentation [33.49405456617909]
We propose the Adaptive Frequency Enhancement Network (AFENet), which integrates two key components: the Adaptive Frequency and Spatial feature Interaction Module (AFSIM) and the Selective feature Fusion Module (SFM)<n>AFSIM dynamically separates and modulates high- and low-frequency features according to the content of the input image.<n>SFM selectively fuses global context and local detailed features to enhance the network's representation capability.
arXiv Detail & Related papers (2025-04-03T14:42:49Z) - FMNet: Frequency-Assisted Mamba-Like Linear Attention Network for Camouflaged Object Detection [7.246630480680039]
Camouflaged Object Detection (COD) is challenging due to the strong similarity between camouflaged objects and their surroundings.<n>Existing methods mainly rely on spatial local features, failing to capture global information.<n> Frequency-Assisted Mamba-Like Linear Attention Network (FMNet) is proposed to efficiently capture global features.
arXiv Detail & Related papers (2025-03-14T02:55:19Z) - Frequency-Spatial Entanglement Learning for Camouflaged Object Detection [34.426297468968485]
Existing methods attempt to reduce the impact of pixel similarity by maximizing the distinguishing ability of spatial features with complicated design.
We propose a new approach to address this issue by jointly exploring the representation in the frequency and spatial domains, introducing the Frequency-Spatial Entanglement Learning (FSEL) method.
Our experiments demonstrate the superiority of our FSEL over 21 state-of-the-art methods, through comprehensive quantitative and qualitative comparisons in three widely-used datasets.
arXiv Detail & Related papers (2024-09-03T07:58:47Z) - Frequency Domain Modality-invariant Feature Learning for
Visible-infrared Person Re-Identification [79.9402521412239]
We propose a novel Frequency Domain modality-invariant feature learning framework (FDMNet) to reduce modality discrepancy from the frequency domain perspective.
Our framework introduces two novel modules, namely the Instance-Adaptive Amplitude Filter (IAF) and the Phrase-Preserving Normalization (PPNorm)
arXiv Detail & Related papers (2024-01-03T17:11:27Z) - Unified Frequency-Assisted Transformer Framework for Detecting and
Grounding Multi-Modal Manipulation [109.1912721224697]
We present the Unified Frequency-Assisted transFormer framework, named UFAFormer, to address the DGM4 problem.
By leveraging the discrete wavelet transform, we decompose images into several frequency sub-bands, capturing rich face forgery artifacts.
Our proposed frequency encoder, incorporating intra-band and inter-band self-attentions, explicitly aggregates forgery features within and across diverse sub-bands.
arXiv Detail & Related papers (2023-09-18T11:06:42Z) - Dense Affinity Matching for Few-Shot Segmentation [83.65203917246745]
Few-Shot (FSS) aims to segment the novel class images with a few samples.
We propose a dense affinity matching framework to exploit the support-query interaction.
We show that our framework performs very competitively under different settings with only 0.68M parameters.
arXiv Detail & Related papers (2023-07-17T12:27:15Z) - Adaptive Frequency Learning in Two-branch Face Forgery Detection [66.91715092251258]
We propose Adaptively learn Frequency information in the two-branch Detection framework, dubbed AFD.
We liberate our network from the fixed frequency transforms, and achieve better performance with our data- and task-dependent transform layers.
arXiv Detail & Related papers (2022-03-27T14:25:52Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.