LocDiffusion: Identifying Locations on Earth by Diffusing in the Hilbert Space
- URL: http://arxiv.org/abs/2503.18142v1
- Date: Sun, 23 Mar 2025 17:15:26 GMT
- Title: LocDiffusion: Identifying Locations on Earth by Diffusing in the Hilbert Space
- Authors: Zhangyu Wang, Jielu Zhang, Zhongliang Zhou, Qian Cao, Nemin Wu, Zeping Liu, Lan Mu, Yang Song, Yiqun Xie, Ni Lao, Gengchen Mai,
- Abstract summary: We propose to leverage diffusion as a mechanism for image geolocalization.<n>To avoid the problematic manifold reprojection step in diffusion, we developed a novel spherical positional encoding-decoding framework.<n>We train a conditional latent diffusion model called LocDiffusion that generates geolocations under the guidance of images.
- Score: 10.342723428164412
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Image geolocalization is a fundamental yet challenging task, aiming at inferring the geolocation on Earth where an image is taken. Existing methods approach it either via grid-based classification or via image retrieval. Their performance significantly suffers when the spatial distribution of test images does not align with such choices. To address these limitations, we propose to leverage diffusion as a mechanism for image geolocalization. To avoid the problematic manifold reprojection step in diffusion, we developed a novel spherical positional encoding-decoding framework, which encodes points on a spherical surface (e.g., geolocations on Earth) into a Hilbert space of Spherical Harmonics coefficients and decodes points (geolocations) by mode-seeking. We call this type of position encoding Spherical Harmonics Dirac Delta (SHDD) Representation. We also propose a novel SirenNet-based architecture called CS-UNet to learn the conditional backward process in the latent SHDD space by minimizing a latent KL-divergence loss. We train a conditional latent diffusion model called LocDiffusion that generates geolocations under the guidance of images -- to the best of our knowledge, the first generative model for image geolocalization by diffusing geolocation information in a hidden location embedding space. We evaluate our method against SOTA image geolocalization baselines. LocDiffusion achieves competitive geolocalization performance and demonstrates significantly stronger generalizability to unseen geolocations.
Related papers
- Cross-view geo-localization, Image retrieval, Multiscale geometric modeling, Frequency domain enhancement [1.6686955491488273]
Cross-view geo-localization (CVGL) aims to establish spatial correspondences between images captured from significantly different viewpoints.<n>CVGL remains challenging due to severe geometric asymmetry, texture inconsistency across imaging domains, and the progressive degradation of discriminative local information.<n>This paper proposes the Spatial and Frequency Domain Enhancement Network (SFDE), which leverages complementary representations from spatial and frequency domains.
arXiv Detail & Related papers (2026-03-03T08:25:35Z) - HierLoc: Hyperbolic Entity Embeddings for Hierarchical Visual Geolocation [12.392226207474662]
We introduce an entity-centric formulation of geolocation that replaces image-to-image retrieval with a compact hierarchy of geographic entities embedded in Hyperbolic space.<n>Images are aligned directly to country, region, subregion, and city entities through Geo-Weighted Hyperbolic contrastive learning by directly incorporating haversine distance into the contrastive objective.<n>Compared to the current methods in the literature, it reduces mean geodesic error by 19.5%, while improving the fine-grained subregion accuracy by 43%.
arXiv Detail & Related papers (2026-01-30T15:16:07Z) - Scaling Image Geo-Localization to Continent Level [48.7766435870634]
This paper introduces a hybrid approach that achieves fine-grained geo-localization across a large geographic expanse the size of a continent.<n>We leverage a proxy classification task during training to learn rich feature representations that implicitly encode precise location information.<n>Our evaluation demonstrates that our approach can localize within 200m more than 68% of queries of a dataset covering a large part of Europe.
arXiv Detail & Related papers (2025-10-30T17:59:35Z) - Towards Interpretable Geo-localization: a Concept-Aware Global Image-GPS Alignment Framework [9.31168320050859]
Geo-localization involves determining the exact geographic location of images captured globally.<n>Current concept-based interpretability methods fail to align effectively with Geo-alignment image-location embedding objectives.<n>To our knowledge, this is the first work to introduce interpretability into geo-localization.
arXiv Detail & Related papers (2025-09-02T03:07:26Z) - GeoDistill: Geometry-Guided Self-Distillation for Weakly Supervised Cross-View Localization [70.65458151146767]
Cross-view localization is crucial for large-scale outdoor applications like autonomous navigation and augmented reality.<n>Existing methods often rely on fully supervised learning, which requires costly ground-truth pose annotations.<n>We propose GeoDistill, a framework that uses teacher-student learning with Field-of-View (FoV)-based masking.
arXiv Detail & Related papers (2025-07-15T03:00:15Z) - Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation [19.028122299569052]
Global visual geolocation predicts where an image was captured on Earth.<n>In this paper, we aim to close the gap between traditional geolocalization and modern generative methods.<n>Our model achieves state-of-the-art performance on three visual geolocation benchmarks.
arXiv Detail & Related papers (2024-12-09T18:59:04Z) - Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework.
By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information.
Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z) - Enhancing Worldwide Image Geolocation by Ensembling Satellite-Based Ground-Level Attribute Predictors [4.415977307120618]
We examine the challenge of estimating the location of a single ground-level image in the absence of GPS or other location metadata.
We introduce a novel metric, Recall vs Area, which measures the accuracy of estimated distributions of locations.
We then examine an ensembling approach to global-scale image geolocation, which incorporates information from multiple sources.
arXiv Detail & Related papers (2024-07-18T19:15:52Z) - GeoCLIP: Clip-Inspired Alignment between Locations and Images for
Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth.
Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task.
We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z) - G^3: Geolocation via Guidebook Grounding [92.46774241823562]
We study explicit knowledge from human-written guidebooks that describe the salient and class-discriminative visual features humans use for geolocation.
We propose the task of Geolocation via Guidebook Grounding that uses a dataset of StreetView images from a diverse set of locations.
Our approach substantially outperforms a state-of-the-art image-only geolocation method, with an improvement of over 5% in Top-1 accuracy.
arXiv Detail & Related papers (2022-11-28T16:34:40Z) - Accurate 3-DoF Camera Geo-Localization via Ground-to-Satellite Image
Matching [102.39635336450262]
We address the problem of ground-to-satellite image geo-localization by matching a query image captured at the ground level against a large-scale database with geotagged satellite images.
Our new method is able to achieve the fine-grained location of a query image, up to pixel size precision of the satellite image.
arXiv Detail & Related papers (2022-03-26T20:10:38Z) - Low-Rank Subspaces in GANs [101.48350547067628]
This work introduces low-rank subspaces that enable more precise control of GAN generation.
LowRankGAN is able to find the low-dimensional representation of attribute manifold.
Experiments on state-of-the-art GAN models (including StyleGAN2 and BigGAN) trained on various datasets demonstrate the effectiveness of our LowRankGAN.
arXiv Detail & Related papers (2021-06-08T16:16:32Z) - Hierarchical Attention Fusion for Geo-Localization [7.544917072241684]
We introduce a hierarchical attention fusion network using multi-scale features for geo-localization.
We extract the hierarchical feature maps from a convolutional neural network (CNN) and organically fuse the extracted features for image representations.
Our training is self-supervised using adaptive weights to control the attention of feature emphasis from each hierarchical level.
arXiv Detail & Related papers (2021-02-18T07:07:03Z) - Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms.
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center.
We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z) - Zero-Shot Multi-View Indoor Localization via Graph Location Networks [66.05980368549928]
indoor localization is a fundamental problem in location-based applications.
We propose a novel neural network based architecture Graph Location Networks (GLN) to perform infrastructure-free, multi-view image based indoor localization.
GLN makes location predictions based on robust location representations extracted from images through message-passing networks.
We introduce a novel zero-shot indoor localization setting and tackle it by extending the proposed GLN to a dedicated zero-shot version.
arXiv Detail & Related papers (2020-08-06T07:36:55Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.