Related papers: (MGS)$^2$-Net: Unifying Micro-Geometric Scale and Macro-Geometric Structure for Cross-View Geo-Localization

(MGS)$^2$-Net: Unifying Micro-Geometric Scale and Macro-Geometric Structure for Cross-View Geo-Localization

URL: http://arxiv.org/abs/2602.10704v1
Date: Wed, 11 Feb 2026 10:03:31 GMT
Title: (MGS)$^2$-Net: Unifying Micro-Geometric Scale and Macro-Geometric Structure for Cross-View Geo-Localization
Authors: Minglei Li, Mengfan He, Chao Chen, Ziyang Meng,
Abstract summary: Cross-view geo-localization (CVGL) is pivotal for UAV navigation but remains brittle under the drastic geometric misalignment between oblique aerial views and orthographic satellite references.<n>We propose (MGS)$2$, a geometry-grounded framework to bridge this gap.<n>Experiments demonstrate that (MGS)$2$ state-of-the-art performance, recording a Recall@1 of 97.5% on University-1652 and 97.02% on SUES-200.
Score: 6.842471990535349
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-view geo-localization (CVGL) is pivotal for GNSS-denied UAV navigation but remains brittle under the drastic geometric misalignment between oblique aerial views and orthographic satellite references. Existing methods predominantly operate within a 2D manifold, neglecting the underlying 3D geometry where view-dependent vertical facades (macro-structure) and scale variations (micro-scale) severely corrupt feature alignment. To bridge this gap, we propose (MGS)$^2$, a geometry-grounded framework. The core of our innovation is the Macro-Geometric Structure Filtering (MGSF) module. Unlike pixel-wise matching sensitive to noise, MGSF leverages dilated geometric gradients to physically filter out high-frequency facade artifacts while enhancing the view-invariant horizontal plane, directly addressing the domain shift. To guarantee robust input for this structural filtering, we explicitly incorporate a Micro-Geometric Scale Adaptation (MGSA) module. MGSA utilizes depth priors to dynamically rectify scale discrepancies via multi-branch feature fusion. Furthermore, a Geometric-Appearance Contrastive Distillation (GACD) loss is designed to strictly discriminate against oblique occlusions. Extensive experiments demonstrate that (MGS)$^2$ achieves state-of-the-art performance, recording a Recall@1 of 97.5\% on University-1652 and 97.02\% on SUES-200. Furthermore, the framework exhibits superior cross-dataset generalization against geometric ambiguity. The code is available at: \href{https://github.com/GabrielLi1473/MGS-Net}{https://github.com/GabrielLi1473/MGS-Net}.

Related papers

Geometry OR Tracker: Universal Geometric Operating Room Tracking [61.399734016038614]
In operating rooms (OR), world-scale multi-view 3D tracking supports downstream applications such as surgeon behavior recognition.<n>Camera calibration and RGB-D registration are always unreliable, leading to cross-view geometric inconsistency.<n>We introduce Geometry OR Tracker, a two-stage pipeline that rectifies imprecise calibration into a scaleconsistent and geometrically consistent camera setup.
arXiv Detail & Related papers (2026-02-28T09:21:21Z)
Understanding and Improving UMAP with Geometric and Topological Priors: The JORC-UMAP Algorithm [1.7484982792736636]
dimensionality reduction techniques, particularly UMAP, are widely used for visualizing high-dimensional data.<n>We introduce Ollivier-Ricci curvature as a geometric prior, reinforcing edges at geometric bottlenecks and reducing redundant links.<n>Experiments on synthetic and real-world datasets show that JORC-UMAP reduces tearing and collapse more effectively than standard UMAP and other DR methods.
arXiv Detail & Related papers (2026-01-23T08:42:56Z)
MVGGT: Multimodal Visual Geometry Grounded Transformer for Multiview 3D Referring Expression Segmentation [59.75554954111619]
We introduce Multi-view 3D Referring Expression (MV-3DRES), where the model must recover scene structure and segment the referred object directly from sparse multi-view images.<n>We propose the Multimodal Visual Geometry Grounded Transformer (MVGGT), an efficient end-to-end framework that integrates language information into sparse-view geometric reasoning.<n>Experiments show that MVGGT establishes the first strong baseline and achieves both high accuracy and fast inference, outperforming existing alternatives.
arXiv Detail & Related papers (2026-01-11T11:44:07Z)
Geodiffussr: Generative Terrain Texturing with Elevation Fidelity [48.82552523546255]
We introduce Geodiffussr, a flow-matching pipeline that synthesizes text-guided texture maps.<n>The core mechanism is multi-scale content aggregation (MCA): DEM features are injected into UNet blocks at multiple resolutions to enforce global-to-local elevation consistency.<n>To train and evaluate Geodiffussr, we assemble a globally distributed, biome- and climate-stratified corpus of triplets pairing SRTM-derived DEMs with Sentinel-2 imagery and vision-grounded natural-appearance captions.
arXiv Detail & Related papers (2025-11-28T09:52:44Z)
VA-GS: Enhancing the Geometric Representation of Gaussian Splatting via View Alignment [48.147381011235446]
3D Gaussian Splatting has recently emerged as an efficient solution for real-time novel view synthesis.<n>We propose a novel method that enhances the geometric representation of 3D Gaussians through view alignment.<n>Our method achieves state-of-the-art performance in both surface reconstruction and novel view synthesis.
arXiv Detail & Related papers (2025-10-13T14:44:50Z)
StableGS: A Floater-Free Framework for 3D Gaussian Splatting [9.935869165752283]
3D Gaussian Splatting (3DGS) reconstructions are plagued by stubborn floater" artifacts that degrade their geometric and visual fidelity.<n>We propose StableGS, a novel framework that decouples geometric regularization from final appearance rendering.<n> Experiments on multiple benchmarks show StableGS not only eliminates floaters but also resolves the common blur-artifact trade-off.
arXiv Detail & Related papers (2025-03-24T09:02:51Z)
Point Cloud Denoising With Fine-Granularity Dynamic Graph Convolutional Networks [58.050130177241186]
Noise perturbations often corrupt 3-D point clouds, hindering downstream tasks such as surface reconstruction, rendering, and further processing. This paper introduces finegranularity dynamic graph convolutional networks called GDGCN, a novel approach to denoising in 3-D point clouds.
arXiv Detail & Related papers (2024-11-21T14:19:32Z)
GeoDTR+: Toward generic cross-view geolocalization via geometric disentanglement [20.346145927174373]
Cross-View Geo-Localization (CVGL) estimates the location of a ground image by matching it to a geo-tagged aerial image in a database. Existing methods still suffer from poor performance in cross-area evaluation, in which the training and testing data are captured from completely distinct areas. We attribute this deficiency to the lack of ability to extract the geometric layout of visual features and models' overfitting to low-level details. In this work, we propose GeoDTR+ with an enhanced GLE module that better models the correlations among visual features.
arXiv Detail & Related papers (2023-08-18T15:32:01Z)
Geometry Constrained Weakly Supervised Object Localization [55.17224813345206]
We propose a geometry constrained network, termed GC-Net, for weakly supervised object localization. The detector predicts the object location defined by a set of coefficients describing a geometric shape. The generator takes the resulting masked images as input and performs two complementary classification tasks for the object and background. In contrast to previous approaches, GC-Net is trained end-to-end and predict object location without any post-processing.
arXiv Detail & Related papers (2020-07-19T17:33:42Z)

This list is automatically generated from the titles and abstracts of the papers in this site.