Related papers: Cross-view geo-localization, Image retrieval, Multiscale geometric modeling, Frequency domain enhancement

Cross-view geo-localization, Image retrieval, Multiscale geometric modeling, Frequency domain enhancement

URL: http://arxiv.org/abs/2603.02726v1
Date: Tue, 03 Mar 2026 08:25:35 GMT
Title: Cross-view geo-localization, Image retrieval, Multiscale geometric modeling, Frequency domain enhancement
Authors: Hongying Zhang, ShuaiShuai Ma,
Abstract summary: Cross-view geo-localization (CVGL) aims to establish spatial correspondences between images captured from significantly different viewpoints.<n>CVGL remains challenging due to severe geometric asymmetry, texture inconsistency across imaging domains, and the progressive degradation of discriminative local information.<n>This paper proposes the Spatial and Frequency Domain Enhancement Network (SFDE), which leverages complementary representations from spatial and frequency domains.
Score: 1.6686955491488273
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-view geo-localization (CVGL) aims to establish spatial correspondences between images captured from significantly different viewpoints and constitutes a fundamental technique for visual localization in GNSS-denied environments. Nevertheless, CVGL remains challenging due to severe geometric asymmetry, texture inconsistency across imaging domains, and the progressive degradation of discriminative local information. Existing methods predominantly rely on spatial domain feature alignment, which is inherently sensitive to large scale viewpoint variations and local disturbances. To alleviate these limitations, this paper proposes the Spatial and Frequency Domain Enhancement Network (SFDE), which leverages complementary representations from spatial and frequency domains. SFDE adopts a three branch parallel architecture to model global semantic context, local geometric structure, and statistical stability in the frequency domain, respectively, thereby characterizing consistency across domains from the perspectives of scene topology, multiscale structural patterns, and frequency invariance. The resulting complementary features are jointly optimized in a unified embedding space via progressive enhancement and coupled constraints, enabling the learning of cross-view representations with consistency across multiple granularities. Comprehensive experiments show that SFDE achieves competitive performance and in many cases even surpasses state-of-the-art methods, while maintaining a lightweight and computationally efficient design. {Our code is available at https://github.com/Mashuaishuai669/SFDE

Related papers

OCTOPUS: Enhancing the Spatial-Awareness of Vision SSMs with Multi-Dimensional Scans and Traversal Selection [20.717476762904038]
We introduce OCTOPUS, a novel architecture that preserves both global context and local spatial structure within images.<n>OCTOPUS performs discrete reoccurrence along eight principal orientations, going forward or backward in the horizontal, vertical, and diagonal directions.<n>In our classification and segmentation benchmarks, OCTOPUS demonstrates notable improvements in boundary preservation and region consistency.
arXiv Detail & Related papers (2026-01-31T21:12:59Z)
JRN-Geo: A Joint Perception Network based on RGB and Normal images for Cross-view Geo-localization [26.250213248316342]
Cross-view geo-localization plays a critical role in Unmanned Aerial Vehicle (UAV) localization and navigation.<n>Existing methods predominantly rely on semantic features from RGB images.<n>We introduce a Joint perception network to integrate RGB and Normal images.
arXiv Detail & Related papers (2025-09-06T12:11:51Z)
GCRPNet: Graph-Enhanced Contextual and Regional Perception Network for Salient Object Detection in Optical Remote Sensing Images [68.33481681452675]
We propose a graph-enhanced contextual and regional perception network (GCRPNet)<n>It builds upon the Mamba architecture to simultaneously capture long-range dependencies and enhance regional feature representation.<n>It performs adaptive patch scanning on feature maps processed via multi-scale convolutions, thereby capturing rich local region information.
arXiv Detail & Related papers (2025-08-14T11:31:43Z)
Hierarchical Graph Attention Network for No-Reference Omnidirectional Image Quality Assessment [21.897948374713163]
Current Omnidirectional Image Quality Assessment (OIQA) methods struggle to evaluate locally non-uniform distortions.<n>We propose a graph neural network-based OIQA framework that explicitly models structural relationships between viewports.
arXiv Detail & Related papers (2025-08-13T14:25:24Z)
HSRMamba: Contextual Spatial-Spectral State Space Model for Single Image Hyperspectral Super-Resolution [41.93421212397078]
Mamba has demonstrated exceptional performance in visual tasks due to its powerful global modeling capabilities and linear computational complexity.<n>HSRMamba is a contextual spatial-spectral modeling state space model for hyperspectral image super-resolution (HSISR)
arXiv Detail & Related papers (2025-01-30T17:10:53Z)
Exploiting Aggregation and Segregation of Representations for Domain Adaptive Human Pose Estimation [50.31351006532924]
Human pose estimation (HPE) has received increasing attention recently due to its wide application in motion analysis, virtual reality, healthcare, etc.<n>It suffers from the lack of labeled diverse real-world datasets due to the time- and labor-intensive annotation.<n>We introduce a novel framework that capitalizes on both representation aggregation and segregation for domain adaptive human pose estimation.
arXiv Detail & Related papers (2024-12-29T17:59:45Z)
Multisource Collaborative Domain Generalization for Cross-Scene Remote Sensing Image Classification [57.945437355714155]
Cross-scene image classification aims to transfer prior knowledge of ground materials to annotate regions with different distributions.<n>Existing approaches focus on single-source domain generalization to unseen target domains.<n>We propose a novel multi-source collaborative domain generalization framework (MS-CDG) based on homogeneity and heterogeneity characteristics of multi-source remote sensing data.
arXiv Detail & Related papers (2024-12-05T06:15:08Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
Decomposition-based Unsupervised Domain Adaptation for Remote Sensing Image Semantic Segmentation [30.606689882397223]
Unsupervised domain adaptation (UDA) techniques are vital for semantic segmentation in geosciences. Most existing UDA methods, which focus on domain alignment at the high-level feature space, struggle to simultaneously retain local spatial details and global contextual semantics. A novel decomposition scheme is proposed to guide domain-invariant representation learning.
arXiv Detail & Related papers (2024-04-06T07:13:49Z)
Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method. We embed multi-scale complementary features from the same view position into a set of nodes. By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z)
DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification [109.09061514799413]
Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions. We propose a tri-spectral image generation pipeline that transforms HSI into high-quality tri-spectral images. Our proposed method outperforms state-of-the-art methods for HSI classification.
arXiv Detail & Related papers (2023-04-19T18:32:52Z)

This list is automatically generated from the titles and abstracts of the papers in this site.