Related papers: HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation

HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation

URL: http://arxiv.org/abs/2601.23064v1
Date: Fri, 30 Jan 2026 15:16:07 GMT
Title: HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation
Authors: Hari Krishna Gadi, Daniel Matos, Hongyi Luo, Lu Liu, Yongliang Wang, Yanfeng Zhang, Liqiu Meng,
Abstract summary: We introduce an entity-centric formulation of geolocation that replaces image-to-image retrieval with a compact hierarchy of geographic entities embedded in Hyperbolic space.<n>Images are aligned directly to country, region, subregion, and city entities through Geo-Weighted Hyperbolic contrastive learning by directly incorporating haversine distance into the contrastive objective.<n>Compared to the current methods in the literature, it reduces mean geodesic error by 19.5%, while improving the fine-grained subregion accuracy by 43%.
Score: 12.392226207474662
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Visual geolocalization, the task of predicting where an image was taken, remains challenging due to global scale, visual ambiguity, and the inherently hierarchical structure of geography. Existing paradigms rely on either large-scale retrieval, which requires storing a large number of image embeddings, grid-based classifiers that ignore geographic continuity, or generative models that diffuse over space but struggle with fine detail. We introduce an entity-centric formulation of geolocation that replaces image-to-image retrieval with a compact hierarchy of geographic entities embedded in Hyperbolic space. Images are aligned directly to country, region, subregion, and city entities through Geo-Weighted Hyperbolic contrastive learning by directly incorporating haversine distance into the contrastive objective. This hierarchical design enables interpretable predictions and efficient inference with 240k entity embeddings instead of over 5 million image embeddings on the OSV5M benchmark, on which our method establishes a new state-of-the-art performance. Compared to the current methods in the literature, it reduces mean geodesic error by 19.5\%, while improving the fine-grained subregion accuracy by 43%. These results demonstrate that geometry-aware hierarchical embeddings provide a scalable and conceptually new alternative for global image geolocation.

Related papers

Scaling Image Geo-Localization to Continent Level [48.7766435870634]
This paper introduces a hybrid approach that achieves fine-grained geo-localization across a large geographic expanse the size of a continent.<n>We leverage a proxy classification task during training to learn rich feature representations that implicitly encode precise location information.<n>Our evaluation demonstrates that our approach can localize within 200m more than 68% of queries of a dataset covering a large part of Europe.
arXiv Detail & Related papers (2025-10-30T17:59:35Z)
GeoSURGE: Geo-localization using Semantic Fusion with Hierarchy of Geographic Embeddings [3.43519422766841]
We formulate geo-localization as aligning the visual representation of the query image with a learned geographic representation.<n>Our main experiments demonstrate improved all-time bests in 22 out of 25 metrics measured across five benchmark datasets.
arXiv Detail & Related papers (2025-10-01T20:39:48Z)
Towards Interpretable Geo-localization: a Concept-Aware Global Image-GPS Alignment Framework [9.31168320050859]
Geo-localization involves determining the exact geographic location of images captured globally.<n>Current concept-based interpretability methods fail to align effectively with Geo-alignment image-location embedding objectives.<n>To our knowledge, this is the first work to introduce interpretability into geo-localization.
arXiv Detail & Related papers (2025-09-02T03:07:26Z)
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation [50.433911327489554]
We introduce EarthMapper, a novel framework for controllable satellite-map translation.<n>We also contribute CNSatMap, a large-scale dataset comprising 302,132 precisely aligned satellite-map pairs across 38 Chinese cities.<n> experiments on CNSatMap and the New York dataset demonstrate EarthMapper's superior performance.
arXiv Detail & Related papers (2025-04-28T02:41:12Z)
LocDiff: Identifying Locations on Earth by Diffusing in the Hilbert Space [20.664043071378273]
LocDiff is the first image geolocalization model that performs latent diffusion in a multi-scale location encoding space.<n>We show that LocDiff can outperform all state-of-the-art grid-based, retrieval-based, and diffusion-based baselines across 5 challenging global-scale image geolocalization datasets.
arXiv Detail & Related papers (2025-03-23T17:15:26Z)
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework [59.42946541163632]
We introduce a comprehensive geolocation framework with three key components.<n>GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric.<n>We demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.
arXiv Detail & Related papers (2025-02-19T14:21:25Z)
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework. By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information. Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z)
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z)
Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images. Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z)
Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms. Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center. We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.