Related papers: Enhancing Contrastive Learning for Geolocalization by Discovering Hard Negatives on Semivariograms

Enhancing Contrastive Learning for Geolocalization by Discovering Hard Negatives on Semivariograms

URL: http://arxiv.org/abs/2509.21573v1
Date: Thu, 25 Sep 2025 20:53:06 GMT
Title: Enhancing Contrastive Learning for Geolocalization by Discovering Hard Negatives on Semivariograms
Authors: Boyi Chen, Zhangyu Wang, Fabian Deuser, Johann Maximilian Zollner, Martin Werner,
Abstract summary: We propose a novel spatially regularized contrastive learning strategy that integrates a semivariogram.<n>We show that explicitly modeling spatial priors improves image-based geo-localization performance.
Score: 7.1220661738937325
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Accurate and robust image-based geo-localization at a global scale is challenging due to diverse environments, visually ambiguous scenes, and the lack of distinctive landmarks in many regions. While contrastive learning methods show promising performance by aligning features between street-view images and corresponding locations, they neglect the underlying spatial dependency in the geographic space. As a result, they fail to address the issue of false negatives -- image pairs that are both visually and geographically similar but labeled as negatives, and struggle to effectively distinguish hard negatives, which are visually similar but geographically distant. To address this issue, we propose a novel spatially regularized contrastive learning strategy that integrates a semivariogram, which is a geostatistical tool for modeling how spatial correlation changes with distance. We fit the semivariogram by relating the distance of images in feature space to their geographical distance, capturing the expected visual content in a spatial correlation. With the fitted semivariogram, we define the expected visual dissimilarity at a given spatial distance as reference to identify hard negatives and false negatives. We integrate this strategy into GeoCLIP and evaluate it on the OSV5M dataset, demonstrating that explicitly modeling spatial priors improves image-based geo-localization performance, particularly at finer granularity.

Related papers

HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation [12.392226207474662]
We introduce an entity-centric formulation of geolocation that replaces image-to-image retrieval with a compact hierarchy of geographic entities embedded in Hyperbolic space.<n>Images are aligned directly to country, region, subregion, and city entities through Geo-Weighted Hyperbolic contrastive learning by directly incorporating haversine distance into the contrastive objective.<n>Compared to the current methods in the literature, it reduces mean geodesic error by 19.5%, while improving the fine-grained subregion accuracy by 43%.
arXiv Detail & Related papers (2026-01-30T15:16:07Z)
Object Detection as an Optional Basis: A Graph Matching Network for Cross-View UAV Localization [17.908597896653045]
This paper presents a cross-view UAV localization framework that performs map matching via object detection.<n>In typical pipelines, UAV visual localization is formulated as an image-retrieval problem.<n>Our method achieves strong retrieval and localization performance using a fine-grained, graph-based node-similarity metric.
arXiv Detail & Related papers (2025-11-04T11:25:31Z)
GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization [70.65458151146767]
Cross-view localization is crucial for large-scale outdoor applications like autonomous navigation and augmented reality.<n>Existing methods often rely on fully supervised learning, which requires costly ground-truth pose annotations.<n>We propose GeoDistill, a framework that uses teacher-student learning with Field-of-View (FoV)-based masking.
arXiv Detail & Related papers (2025-07-15T03:00:15Z)
ConGeo: Robust Cross-view Geo-localization across Ground View Variations [34.192775134189965]
Cross-view geo-localization aims at localizing a ground-level query image by matching it to its corresponding geo-referenced aerial view. Existing learning pipelines are orientation-specific or FoV-specific, demanding separate model training for different ground view variations. We propose ConGeo, a single- and cross-view Contrastive method for Geo-localization.
arXiv Detail & Related papers (2024-03-20T20:37:13Z)
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z)
Sample4Geo: Hard Negative Sampling For Cross-View Geo-Localisation [2.3020018305241337]
We present a simplified but effective architecture based on contrastive learning with symmetric InfoNCE loss. Our framework consists of a narrow training pipeline that eliminates the need of using aggregation modules. Our work shows excellent performance on common cross-view datasets like CVUSA, CVACT, University-1652 and VIGOR.
arXiv Detail & Related papers (2023-03-21T13:49:49Z)
Cross-view Geo-localization via Learning Disentangled Geometric Layout Correspondence [11.823147814005411]
Cross-view geo-localization aims to estimate the location of a query ground image by matching it to a reference geo-tagged aerial images database. Recent works achieve outstanding progress on cross-view geo-localization benchmarks. However, existing methods still suffer from poor performance on the cross-area benchmarks.
arXiv Detail & Related papers (2022-12-08T04:54:01Z)
Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images. Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z)
Geography-Aware Self-Supervised Learning [79.4009241781968]
We show that due to their different characteristics, a non-trivial gap persists between contrastive and supervised learning on standard benchmarks. We propose novel training methods that exploit the spatially aligned structure of remote sensing data. Our experiments show that our proposed method closes the gap between contrastive and supervised learning on image classification, object detection and semantic segmentation for remote sensing.
arXiv Detail & Related papers (2020-11-19T17:29:13Z)
Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms. Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center. We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z)
High-Order Information Matters: Learning Relation and Topology for Occluded Person Re-Identification [84.43394420267794]
We propose a novel framework by learning high-order relation and topology information for discriminative features and robust alignment. Our framework significantly outperforms state-of-the-art by6.5%mAP scores on Occluded-Duke dataset.
arXiv Detail & Related papers (2020-03-18T12:18:35Z)

This list is automatically generated from the titles and abstracts of the papers in this site.