Related papers: Cross-View Image Set Geo-Localization

Cross-View Image Set Geo-Localization

URL: http://arxiv.org/abs/2412.18852v1
Date: Wed, 25 Dec 2024 09:46:14 GMT
Title: Cross-View Image Set Geo-Localization
Authors: Qiong Wu, Panwang Xia, Lei Yu, Yi Liu, Mingtao Xiong, Liheng Zhong, Jingdong Chen, Ming Yang, Yongjun Zhang, Yi Wan,
Abstract summary: Cross-view geo-localization (CVGL) has been widely applied in fields such as robotic navigation and augmented reality.<n>We propose a novel task: Cross-View Image Set Geo-Localization (Set-CVGL), which gathers multiple images with diverse perspectives as a query set for localization.
Score: 29.13525096798705
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Cross-view geo-localization (CVGL) has been widely applied in fields such as robotic navigation and augmented reality. Existing approaches primarily use single images or fixed-view image sequences as queries, which limits perspective diversity. In contrast, when humans determine their location visually, they typically move around to gather multiple perspectives. This behavior suggests that integrating diverse visual cues can improve geo-localization reliability. Therefore, we propose a novel task: Cross-View Image Set Geo-Localization (Set-CVGL), which gathers multiple images with diverse perspectives as a query set for localization. To support this task, we introduce SetVL-480K, a benchmark comprising 480,000 ground images captured worldwide and their corresponding satellite images, with each satellite image corresponds to an average of 40 ground images from varied perspectives and locations. Furthermore, we propose FlexGeo, a flexible method designed for Set-CVGL that can also adapt to single-image and image-sequence inputs. FlexGeo includes two key modules: the Similarity-guided Feature Fuser (SFF), which adaptively fuses image features without prior content dependency, and the Individual-level Attributes Learner (IAL), leveraging geo-attributes of each image for comprehensive scene perception. FlexGeo consistently outperforms existing methods on SetVL-480K and two public datasets, SeqGeo and KITTI-CVL, achieving a localization accuracy improvement of over 22% on SetVL-480K.

Related papers

GeoVLM: Improving Automated Vehicle Geolocalisation Using Vision-Language Matching [6.8045687415659275]
Cross-view geo-localisation identifies coarse geographical position of an automated vehicle by matching a ground-level image to a geo-tagged satellite image from a database.<n>Existing approaches reach high recall rates but they still fail to rank the correct image as the top match.<n>This paper proposes GeoVLM, a novel approach which uses the zero-shot capabilities of vision language models to enable cross-view geo-localisation.
arXiv Detail & Related papers (2025-05-19T19:17:06Z)
VAGeo: View-specific Attention for Cross-View Object Geo-Localization [19.4845592498138]
Cross-view object geo-localization (CVOGL) aims to locate an object of interest in a captured ground- or drone-view image within the satellite image. This paper proposes a novel View-specific Attention Geo-localization method (VAGeo) for accurate CVOGL.
arXiv Detail & Related papers (2025-01-13T10:42:18Z)
CV-Cities: Advancing Cross-View Geo-Localization in Global Cities [3.074201632920997]
Cross-view geo-localization (CVGL) involves matching and retrieving satellite images to determine the geographic location of a ground image. This task faces significant challenges due to substantial viewpoint discrepancies, the complexity of localization scenarios, and the need for global localization. We propose a novel CVGL framework that integrates the foundational model DINOv2 with an advanced feature mixer.
arXiv Detail & Related papers (2024-11-19T11:41:22Z)
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework. By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information. Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z)
G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models [40.69217368870192]
We propose a novel framework for worldwide geolocalization based on Retrieval-Augmented Generation (RAG) G3 consists of three steps, i.e., Geo-alignment, Geo-diversification, and Geo-verification. Experiments on two well-established datasets verify the superiority of G3 compared to other state-of-the-art methods.
arXiv Detail & Related papers (2024-05-23T15:37:06Z)
ConGeo: Robust Cross-view Geo-localization across Ground View Variations [34.192775134189965]
Cross-view geo-localization aims at localizing a ground-level query image by matching it to its corresponding geo-referenced aerial view. Existing learning pipelines are orientation-specific or FoV-specific, demanding separate model training for different ground view variations. We propose ConGeo, a single- and cross-view Contrastive method for Geo-localization.
arXiv Detail & Related papers (2024-03-20T20:37:13Z)
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z)
CSP: Self-Supervised Contrastive Spatial Pre-Training for Geospatial-Visual Representations [90.50864830038202]
We present Contrastive Spatial Pre-Training (CSP), a self-supervised learning framework for geo-tagged images. We use a dual-encoder to separately encode the images and their corresponding geo-locations, and use contrastive objectives to learn effective location representations from images. CSP significantly boosts the model performance with 10-34% relative improvement with various labeled training data sampling ratios.
arXiv Detail & Related papers (2023-05-01T23:11:18Z)
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes [53.53712888703834]
We introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels. We achieve state of the art street level accuracy on 4 standard geo-localization datasets.
arXiv Detail & Related papers (2023-03-07T21:47:58Z)
GAMa: Cross-view Video Geo-localization [68.33955764543465]
We focus on ground videos instead of images which provides contextual cues. At clip-level, a short video clip is matched with corresponding aerial image and is later used to get video-level geo-localization of a long video. Our proposed method achieves a Top-1 recall rate of 19.4% and 45.1% @1.0mile.
arXiv Detail & Related papers (2022-07-06T04:25:51Z)
Where in the World is this Image? Transformer-based Geo-localization in the Wild [48.69031054573838]
Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. We propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image. We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement.
arXiv Detail & Related papers (2022-04-29T03:27:23Z)

This list is automatically generated from the titles and abstracts of the papers in this site.