Related papers: JRN-Geo: A Joint Perception Network based on RGB and Normal images for Cross-view Geo-localization

JRN-Geo: A Joint Perception Network based on RGB and Normal images for Cross-view Geo-localization

URL: http://arxiv.org/abs/2509.05696v1
Date: Sat, 06 Sep 2025 12:11:51 GMT
Title: JRN-Geo: A Joint Perception Network based on RGB and Normal images for Cross-view Geo-localization
Authors: Hongyu Zhou, Yunzhou Zhang, Tingsong Huang, Fawei Ge, Man Qi, Xichen Zhang, Yizhong Zhang,
Abstract summary: Cross-view geo-localization plays a critical role in Unmanned Aerial Vehicle (UAV) localization and navigation.<n>Existing methods predominantly rely on semantic features from RGB images.<n>We introduce a Joint perception network to integrate RGB and Normal images.
Score: 26.250213248316342
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-view geo-localization plays a critical role in Unmanned Aerial Vehicle (UAV) localization and navigation. However, significant challenges arise from the drastic viewpoint differences and appearance variations between images. Existing methods predominantly rely on semantic features from RGB images, often neglecting the importance of spatial structural information in capturing viewpoint-invariant features. To address this issue, we incorporate geometric structural information from normal images and introduce a Joint perception network to integrate RGB and Normal images (JRN-Geo). Our approach utilizes a dual-branch feature extraction framework, leveraging a Difference-Aware Fusion Module (DAFM) and Joint-Constrained Interaction Aggregation (JCIA) strategy to enable deep fusion and joint-constrained semantic and structural information representation. Furthermore, we propose a 3D geographic augmentation technique to generate potential viewpoint variation samples, enhancing the network's ability to learn viewpoint-invariant features. Extensive experiments on the University-1652 and SUES-200 datasets validate the robustness of our method against complex viewpoint ariations, achieving state-of-the-art performance.

Related papers

Cross-view geo-localization, Image retrieval, Multiscale geometric modeling, Frequency domain enhancement [1.6686955491488273]
Cross-view geo-localization (CVGL) aims to establish spatial correspondences between images captured from significantly different viewpoints.<n>CVGL remains challenging due to severe geometric asymmetry, texture inconsistency across imaging domains, and the progressive degradation of discriminative local information.<n>This paper proposes the Spatial and Frequency Domain Enhancement Network (SFDE), which leverages complementary representations from spatial and frequency domains.
arXiv Detail & Related papers (2026-03-03T08:25:35Z)
SegMASt3R: Geometry Grounded Segment Matching [23.257530861472656]
We leverage the spatial understanding of 3D foundation models to tackle wide-baseline segment matching.<n>We propose an architecture that uses the inductive bias of these 3D foundation models to match segments across image pairs with up to 180 degree view-point change rotation.
arXiv Detail & Related papers (2025-10-06T17:31:32Z)
GCRPNet: Graph-Enhanced Contextual and Regional Perception Network For Salient Object Detection in Optical Remote Sensing Images [68.33481681452675]
We propose a graph-enhanced contextual and regional perception network (GCRPNet)<n>It builds upon the Mamba architecture to simultaneously capture long-range dependencies and enhance regional feature representation.<n>It performs adaptive patch scanning on feature maps processed via multi-scale convolutions, thereby capturing rich local region information.
arXiv Detail & Related papers (2025-08-14T11:31:43Z)
Boosting Cross-Domain Point Classification via Distilling Relational Priors from 2D Transformers [59.0181939916084]
Traditional 3D networks mainly focus on local geometric details and ignore the topological structure between local geometries. We propose a novel Priors Distillation (RPD) method to extract priors from the well-trained transformers on massive images. Experiments on the PointDA-10 and the Sim-to-Real datasets verify that the proposed method consistently achieves the state-of-the-art performance of UDA for point cloud classification.
arXiv Detail & Related papers (2024-07-26T06:29:09Z)
Multi-Spectral Image Stitching via Spatial Graph Reasoning [52.27796682972484]
We propose a spatial graph reasoning based multi-spectral image stitching method. We embed multi-scale complementary features from the same view position into a set of nodes. By introducing long-range coherence along spatial and channel dimensions, the complementarity of pixel relations and channel interdependencies aids in the reconstruction of aligned multi-view features.
arXiv Detail & Related papers (2023-07-31T15:04:52Z)
An Interactively Reinforced Paradigm for Joint Infrared-Visible Image Fusion and Saliency Object Detection [59.02821429555375]
This research focuses on the discovery and localization of hidden objects in the wild and serves unmanned systems. Through empirical analysis, infrared and visible image fusion (IVIF) enables hard-to-find objects apparent. multimodal salient object detection (SOD) accurately delineates the precise spatial location of objects within the picture.
arXiv Detail & Related papers (2023-05-17T06:48:35Z)
DCN-T: Dual Context Network with Transformer for Hyperspectral Image Classification [109.09061514799413]
Hyperspectral image (HSI) classification is challenging due to spatial variability caused by complex imaging conditions. We propose a tri-spectral image generation pipeline that transforms HSI into high-quality tri-spectral images. Our proposed method outperforms state-of-the-art methods for HSI classification.
arXiv Detail & Related papers (2023-04-19T18:32:52Z)
Geometric-aware Pretraining for Vision-centric 3D Object Detection [77.7979088689944]
We propose a novel geometric-aware pretraining framework called GAPretrain. GAPretrain serves as a plug-and-play solution that can be flexibly applied to multiple state-of-the-art detectors. We achieve 46.2 mAP and 55.5 NDS on the nuScenes val set using the BEVFormer method, with a gain of 2.7 and 2.1 points, respectively.
arXiv Detail & Related papers (2023-04-06T14:33:05Z)
Simple, Effective and General: A New Backbone for Cross-view Image Geo-localization [9.687328460113832]
We propose a new backbone network, named Simple Attention-based Image Geo-localization network (SAIG) The proposed SAIG effectively represents long-range interactions among patches as well as cross-view correspondence with multi-head self-attention layers. Our SAIG achieves state-of-the-art results on cross-view geo-localization, while being far simpler than previous works.
arXiv Detail & Related papers (2023-02-03T06:50:51Z)
Co-visual pattern augmented generative transformer learning for automobile geo-localization [12.449657263683337]
Cross-view geo-localization (CVGL) aims to estimate the geographical location of the ground-level camera by matching against enormous geo-tagged aerial images. We present a novel approach using cross-view knowledge generative techniques in combination with transformers, namely mutual generative transformer learning (MGTL) for CVGL.
arXiv Detail & Related papers (2022-03-17T07:29:02Z)
Cross-view Geo-localization with Evolving Transformer [7.5800316275498645]
Cross-view geo-localization is challenging due to drastic appearance and geometry differences across views. We devise a novel geo-localization Transformer (EgoTR) that utilizes the properties of self-attention in Transformer to model global dependencies. Our EgoTR performs favorably against state-of-the-art methods on standard, fine-grained and cross-dataset cross-view geo-localization tasks.
arXiv Detail & Related papers (2021-07-02T05:33:14Z)
Spatial--spectral FFPNet: Attention-Based Pyramid Network for Segmentation and Classification of Remote Sensing Images [12.320585790097415]
In this study, we develop an attention-based pyramid network for segmentation and classification of remote sensing datasets. Experiments conducted on ISPRS Vaihingen and ISPRS Potsdam high-resolution datasets demonstrate the competitive segmentation accuracy achieved by the proposed heavy-weight spatial FFPNet.
arXiv Detail & Related papers (2020-08-20T04:55:34Z)

This list is automatically generated from the titles and abstracts of the papers in this site.