Related papers: Revisiting Cross-View Localization from Image Matching

Revisiting Cross-View Localization from Image Matching

URL: http://arxiv.org/abs/2508.10716v1
Date: Thu, 14 Aug 2025 14:57:31 GMT
Title: Revisiting Cross-View Localization from Image Matching
Authors: Panwang Xia, Qiong Wu, Lei Yu, Yi Liu, Mingtao Xiong, Lei Liang, Yongjun Zhang, Yi Wan,
Abstract summary: Cross-view localization aims to estimate the 3 degrees of freedom pose of a ground-view image by registering it to aerial or satellite imagery.<n>Existing methods either regress poses directly or align features in a shared bird's-eye view (BEV) space.<n>We propose a novel framework that improves both matching and localization.
Score: 12.411420734642988
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-view localization aims to estimate the 3 degrees of freedom pose of a ground-view image by registering it to aerial or satellite imagery. It is essential in GNSS-denied environments such as urban canyons and disaster zones. Existing methods either regress poses directly or align features in a shared bird's-eye view (BEV) space, both built upon accurate spatial correspondences between perspectives. However, these methods fail to establish strict cross-view correspondences, yielding only coarse or geometrically inconsistent matches. Consequently, fine-grained image matching between ground and aerial views remains an unsolved problem, which in turn constrains the interpretability of localization results. In this paper, we revisit cross-view localization from the perspective of cross-view image matching and propose a novel framework that improves both matching and localization. Specifically, we introduce a Surface Model to model visible regions for accurate BEV projection, and a SimRefiner module to refine the similarity matrix through local-global residual correction, eliminating the reliance on post-processing like RANSAC. To further support research in this area, we introduce CVFM, the first benchmark with 32,509 cross-view image pairs annotated with pixel-level correspondences. Extensive experiments demonstrate that our approach substantially improves both localization accuracy and image matching quality, setting new baselines under extreme viewpoint disparity.

Related papers

CLNet: Cross-View Correspondence Makes a Stronger Geo-Localizationer [48.52152634356309]
We propose a correspondence-aware feature refinement framework, termed CLNet, that explicitly bridges the semantic and geometric gaps between different views.<n> CLNet decomposes the view alignment process into three learnable and complementary modules.<n>Our proposed CLNet achieves state-of-the-art performance while offering better interpretability and generalizability.
arXiv Detail & Related papers (2025-12-16T16:31:41Z)
Loc$^2$: Interpretable Cross-View Localization via Depth-Lifted Local Feature Matching [80.57282092735991]
We propose an accurate and interpretable fine-grained cross-view localization method.<n>It estimates the 3 Degrees of Freedom (DoF) pose of a ground-level image by matching its local features with a reference aerial image.<n> Experiments show state-of-the-art accuracy in challenging scenarios such as cross-area testing and unknown orientation.
arXiv Detail & Related papers (2025-09-11T18:52:16Z)
FG$^2$: Fine-Grained Cross-View Localization by Fine-Grained Feature Matching [69.81167130510333]
We propose a novel fine-grained cross-view localization method that estimates the 3 Degrees of Freedom pose of a ground-level image in an aerial image of the surroundings.<n>The pose is estimated by aligning a point plane generated from the ground image with a point plane sampled from the aerial image.<n>Compared to the previous state-of-the-art, our method reduces the mean localization error by 28% on the VIGOR cross-area test set.
arXiv Detail & Related papers (2025-03-24T14:34:20Z)
BevSplat: Resolving Height Ambiguity via Feature-Based Gaussian Primitives for Weakly-Supervised Cross-View Localization [11.50186721264038]
This paper addresses the problem of weakly supervised cross-view localization.<n>The goal is to estimate the pose of a ground camera relative to a satellite image with noisy ground truth annotations.<n>We propose BevSplat, a novel method that resolves height ambiguity by using feature-based Gaussian primitives.
arXiv Detail & Related papers (2025-02-13T08:54:04Z)
Cross-View Geo-Localization with Street-View and VHR Satellite Imagery in Decentrality Settings [39.252555758596706]
Cross-View Geo-Localization matches street-view query images with geo-tagged aerial-view reference images.<n>Decentrality is a critical factor warranting deeper investigation, as larger decentrality can substantially improve localization efficiency but comes at the cost of declines in localization accuracy.<n>We introduce DReSS, a novel dataset designed to evaluate cross-view geo-localization with a large geographic scope and diverse landscapes.
arXiv Detail & Related papers (2024-12-16T08:07:53Z)
Unsupervised Multi-view UAV Image Geo-localization via Iterative Rendering [31.716967688739036]
Unmanned Aerial Vehicle (UAV) Cross-View Geo-Localization (CVGL) presents significant challenges. Existing methods rely on the supervision of labeled datasets to extract viewpoint-invariant features for cross-view retrieval. We propose an unsupervised solution that lifts the scene representation to 3d space from UAV observations for satellite image generation.
arXiv Detail & Related papers (2024-11-22T09:22:39Z)
Breaking the Frame: Visual Place Recognition by Overlap Prediction [53.17564423756082]
We propose a novel visual place recognition approach based on overlap prediction, called VOP.<n>VOP proceeds co-visible image sections by obtaining patch-level embeddings using a Vision Transformer backbone.<n>Our approach uses a voting mechanism to assess overlap scores for potential database images.
arXiv Detail & Related papers (2024-06-23T20:00:20Z)
Fine-Grained Cross-View Geo-Localization Using a Correlation-Aware Homography Estimator [12.415973198004169]
We introduce a novel approach to fine-grained cross-view geo-localization. Our method aligns a warped ground image with a corresponding GPS-tagged satellite image covering the same area. operating at a speed of 30 FPS, our method outperforms state-of-the-art techniques.
arXiv Detail & Related papers (2023-08-31T17:59:24Z)
View Consistent Purification for Accurate Cross-View Localization [59.48131378244399]
This paper proposes a fine-grained self-localization method for outdoor robotics. The proposed method addresses limitations in existing cross-view localization methods. It is the first sparse visual-only method that enhances perception in dynamic environments.
arXiv Detail & Related papers (2023-08-16T02:51:52Z)
CVLNet: Cross-View Semantic Correspondence Learning for Video-based Camera Localization [89.69214577915959]
This paper tackles the problem of Cross-view Video-based camera localization. We propose estimating the query camera's relative displacement to a satellite image before similarity matching. Experiments have demonstrated the effectiveness of video-based localization over single image-based localization.
arXiv Detail & Related papers (2022-08-07T07:35:17Z)
Beyond Cross-view Image Retrieval: Highly Accurate Vehicle Localization Using Satellite Image [91.29546868637911]
This paper addresses the problem of vehicle-mounted camera localization by matching a ground-level image with an overhead-view satellite map. The key idea is to formulate the task as pose estimation and solve it by neural-net based optimization. Experiments on standard autonomous vehicle localization datasets have confirmed the superiority of the proposed method.
arXiv Detail & Related papers (2022-04-10T19:16:58Z)
Where am I looking at? Joint Location and Orientation Estimation by Cross-View Matching [95.64702426906466]
Cross-view geo-localization is a problem given a large-scale database of geo-tagged aerial images. Knowing orientation between ground and aerial images can significantly reduce matching ambiguity between these two views. We design a Dynamic Similarity Matching network to estimate cross-view orientation alignment during localization.
arXiv Detail & Related papers (2020-05-08T05:21:16Z)

This list is automatically generated from the titles and abstracts of the papers in this site.