Related papers: CV-Cities: Advancing Cross-View Geo-Localization in Global Cities

CV-Cities: Advancing Cross-View Geo-Localization in Global Cities

URL: http://arxiv.org/abs/2411.12431v1
Date: Tue, 19 Nov 2024 11:41:22 GMT
Title: CV-Cities: Advancing Cross-View Geo-Localization in Global Cities
Authors: Gaoshuang Huang, Yang Zhou, Luying Zhao, Wenjian Gan,
Abstract summary: Cross-view geo-localization (CVGL) involves matching and retrieving satellite images to determine the geographic location of a ground image. This task faces significant challenges due to substantial viewpoint discrepancies, the complexity of localization scenarios, and the need for global localization. We propose a novel CVGL framework that integrates the foundational model DINOv2 with an advanced feature mixer.
Score: 3.074201632920997
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Cross-view geo-localization (CVGL), which involves matching and retrieving satellite images to determine the geographic location of a ground image, is crucial in GNSS-constrained scenarios. However, this task faces significant challenges due to substantial viewpoint discrepancies, the complexity of localization scenarios, and the need for global localization. To address these issues, we propose a novel CVGL framework that integrates the vision foundational model DINOv2 with an advanced feature mixer. Our framework introduces the symmetric InfoNCE loss and incorporates near-neighbor sampling and dynamic similarity sampling strategies, significantly enhancing localization accuracy. Experimental results show that our framework surpasses existing methods across multiple public and self-built datasets. To further improve globalscale performance, we have developed CV-Cities, a novel dataset for global CVGL. CV-Cities includes 223,736 ground-satellite image pairs with geolocation data, spanning sixteen cities across six continents and covering a wide range of complex scenarios, providing a challenging benchmark for CVGL. The framework trained with CV-Cities demonstrates high localization accuracy in various test cities, highlighting its strong globalization and generalization capabilities. Our datasets and codes are available at https://github.com/GaoShuang98/CVCities.

Related papers

VLM-Guided Visual Place Recognition for Planet-Scale Geo-Localization [24.433604332415204]
We propose a novel hybrid geo-localization framework that combines the strengths of vision-language models and visual place recognition.<n>We evaluate our approach on multiple geo-localization benchmarks and show that it consistently outperforms prior state-of-the-art methods.
arXiv Detail & Related papers (2025-07-23T12:23:03Z)
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation [50.433911327489554]
We introduce EarthMapper, a novel framework for controllable satellite-map translation. We also contribute CNSatMap, a large-scale dataset comprising 302,132 precisely aligned satellite-map pairs across 38 Chinese cities. experiments on CNSatMap and the New York dataset demonstrate EarthMapper's superior performance.
arXiv Detail & Related papers (2025-04-28T02:41:12Z)
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework [59.42946541163632]
We introduce a comprehensive geolocation framework with three key components. GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric. We demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.
arXiv Detail & Related papers (2025-02-19T14:21:25Z)
Cross-View Geo-Localization with Street-View and VHR Satellite Imagery in Decentrality Settings [39.252555758596706]
Cross-View Geo-Localization matches street-view query images with geo-tagged aerial-view reference images. Decentrality is a critical factor warranting deeper investigation, as larger decentrality can substantially improve localization efficiency but comes at the cost of declines in localization accuracy. We introduce DReSS, a novel dataset designed to evaluate cross-view geo-localization with a large geographic scope and diverse landscapes.
arXiv Detail & Related papers (2024-12-16T08:07:53Z)
World-Consistent Data Generation for Vision-and-Language Navigation [52.08816337783936]
Vision-and-Language Navigation (VLN) is a challenging task that requires an agent to navigate through photorealistic environments following natural-language instructions. One main obstacle existing in VLN is data scarcity, leading to poor generalization performance over unseen environments. We propose the world-consistent data generation (WCGEN), an efficacious data-augmentation framework satisfying both diversity and world-consistency.
arXiv Detail & Related papers (2024-12-09T11:40:54Z)
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework. By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information. Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z)
AGL-NET: Aerial-Ground Cross-Modal Global Localization with Varying Scales [45.315661330785275]
We present AGL-NET, a novel learning-based method for global localization using LiDAR point clouds and satellite maps. We tackle two critical challenges: bridging the representation gap between image and points modalities for robust feature matching, and handling inherent scale discrepancies between global view and local view.
arXiv Detail & Related papers (2024-04-04T04:12:30Z)
Unleashing Unlabeled Data: A Paradigm for Cross-View Geo-Localization [28.941724648519102]
This paper investigates the effective utilization of unlabeled data for large-area cross-view geo-localization (CVGL) Common approaches to CVGL rely on ground-satellite image pairs and employ label-driven supervised training. We propose an unsupervised framework including a cross-view projection to guide the model for retrieving initial pseudo-labels.
arXiv Detail & Related papers (2024-03-21T07:48:35Z)
SatCLIP: Global, General-Purpose Location Embeddings with Satellite Imagery [22.716322265391852]
We introduce Satellite Contrastive Location-Image Pretraining (SatCLIP) SatCLIP learns an implicit representation of locations by matching CNN and ViT inferred visual patterns of openly available satellite imagery with their geographic coordinates. In experiments, we use SatCLIP embeddings to improve prediction performance on nine diverse location-dependent tasks including temperature prediction, animal recognition, and population density estimation.
arXiv Detail & Related papers (2023-11-28T19:14:40Z)
CurriculumLoc: Enhancing Cross-Domain Geolocalization through Multi-Stage Refinement [11.108860387261508]
Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images taken at some unknown location, to a set of geo-tagged reference images. We develop CurriculumLoc, a novel keypoint detection and description with global semantic awareness and a local geometric verification. We achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively.
arXiv Detail & Related papers (2023-11-20T08:40:01Z)
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z)
GeoNet: Benchmarking Unsupervised Adaptation across Geographies [71.23141626803287]
We study the problem of geographic robustness and make three main contributions. First, we introduce a large-scale dataset GeoNet for geographic adaptation. Second, we hypothesize that the major source of domain shifts arise from significant variations in scene context. Third, we conduct an extensive evaluation of several state-of-the-art unsupervised domain adaptation algorithms and architectures.
arXiv Detail & Related papers (2023-03-27T17:59:34Z)
Think Global, Act Local: Dual-scale Graph Transformer for Vision-and-Language Navigation [87.03299519917019]
We propose a dual-scale graph transformer (DUET) for joint long-term action planning and fine-grained cross-modal understanding. We build a topological map on-the-fly to enable efficient exploration in global action space. The proposed approach, DUET, significantly outperforms state-of-the-art methods on goal-oriented vision-and-language navigation benchmarks.
arXiv Detail & Related papers (2022-02-23T19:06:53Z)
PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space. Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z)
Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms. Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center. We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z)

This list is automatically generated from the titles and abstracts of the papers in this site.