Related papers: G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models

G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models

URL: http://arxiv.org/abs/2405.14702v1
Date: Thu, 23 May 2024 15:37:06 GMT
Title: G3: An Effective and Adaptive Framework for Worldwide Geolocalization Using Large Multi-Modality Models
Authors: Pengyue Jia, Yiding Liu, Xiaopeng Li, Xiangyu Zhao, Yuhao Wang, Yantong Du, Xiao Han, Xuetao Wei, Shuaiqiang Wang, Dawei Yin,
Abstract summary: We propose a novel framework for worldwide geolocalization based on Retrieval-Augmented Generation (RAG) G3 consists of three steps, i.e., Geo-alignment, Geo-diversification, and Geo-verification. Experiments on two well-established datasets verify the superiority of G3 compared to other state-of-the-art methods.
Score: 40.692173688701914
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Worldwide geolocalization aims to locate the precise location at the coordinate level of photos taken anywhere on the Earth. It is very challenging due to 1) the difficulty of capturing subtle location-aware visual semantics, and 2) the heterogeneous geographical distribution of image data. As a result, existing studies have clear limitations when scaled to a worldwide context. They may easily confuse distant images with similar visual contents, or cannot adapt to various locations worldwide with different amounts of relevant data. To resolve these limitations, we propose G3, a novel framework based on Retrieval-Augmented Generation (RAG). In particular, G3 consists of three steps, i.e., Geo-alignment, Geo-diversification, and Geo-verification to optimize both retrieval and generation phases of worldwide geolocalization. During Geo-alignment, our solution jointly learns expressive multi-modal representations for images, GPS and textual descriptions, which allows us to capture location-aware semantics for retrieving nearby images for a given query. During Geo-diversification, we leverage a prompt ensembling method that is robust to inconsistent retrieval performance for different image queries. Finally, we combine both retrieved and generated GPS candidates in Geo-verification for location prediction. Experiments on two well-established datasets IM2GPS3k and YFCC4k verify the superiority of G3 compared to other state-of-the-art methods.

Related papers

GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization [30.983556433953076]
We propose GeoRanker, a distance-aware ranking framework for image geolocalization.<n>We introduce a multi-order distance loss that ranks both absolute and relative distances, enabling the model to reason over structured spatial relationships.<n>GeoRanker achieves state-of-the-art results on two well-established benchmarks.
arXiv Detail & Related papers (2025-05-19T21:04:46Z)
Geolocation with Real Human Gameplay Data: A Large-Scale Dataset and Human-Like Reasoning Framework [59.42946541163632]
We introduce a comprehensive geolocation framework with three key components. GeoComp, a large-scale dataset; GeoCoT, a novel reasoning method; and GeoEval, an evaluation metric. We demonstrate that GeoCoT significantly boosts geolocation accuracy by up to 25% while enhancing interpretability.
arXiv Detail & Related papers (2025-02-19T14:21:25Z)
GaGA: Towards Interactive Global Geolocation Assistant [20.342366228855735]
GaGA is an interactive global geolocation assistant built upon the flourishing large vision-language models (LVLMs) It uncovers geographical clues within images and combines them with the extensive world knowledge embedded in LVLMs to determine the geolocations. GaGA achieves state-of-the-art performance on the GWS15k dataset, improving accuracy by 4.57% at the country level and 2.92% at the city level.
arXiv Detail & Related papers (2024-12-12T03:39:44Z)
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z)
GeoGLUE: A GeoGraphic Language Understanding Evaluation Benchmark [56.08664336835741]
We propose a GeoGraphic Language Understanding Evaluation benchmark, named GeoGLUE. We collect data from open-released geographic resources and introduce six natural language understanding tasks. We pro vide evaluation experiments and analysis of general baselines, indicating the effectiveness and significance of the GeoGLUE benchmark.
arXiv Detail & Related papers (2023-05-11T03:21:56Z)
Where We Are and What We're Looking At: Query Based Worldwide Image Geo-localization Using Hierarchies and Scenes [53.53712888703834]
We introduce an end-to-end transformer-based architecture that exploits the relationship between different geographic levels. We achieve state of the art street level accuracy on 4 standard geo-localization datasets.
arXiv Detail & Related papers (2023-03-07T21:47:58Z)
G^3: Geolocation via Guidebook Grounding [92.46774241823562]
We study explicit knowledge from human-written guidebooks that describe the salient and class-discriminative visual features humans use for geolocation. We propose the task of Geolocation via Guidebook Grounding that uses a dataset of StreetView images from a diverse set of locations. Our approach substantially outperforms a state-of-the-art image-only geolocation method, with an improvement of over 5% in Top-1 accuracy.
arXiv Detail & Related papers (2022-11-28T16:34:40Z)
Where in the World is this Image? Transformer-based Geo-localization in the Wild [48.69031054573838]
Predicting the geographic location (geo-localization) from a single ground-level RGB image taken anywhere in the world is a very challenging problem. We propose TransLocator, a unified dual-branch transformer network that attends to tiny details over the entire image. We evaluate TransLocator on four benchmark datasets - Im2GPS, Im2GPS3k, YFCC4k, YFCC26k and obtain 5.5%, 14.1%, 4.9%, 9.9% continent-level accuracy improvement.
arXiv Detail & Related papers (2022-04-29T03:27:23Z)
Visual and Object Geo-localization: A Comprehensive Survey [11.120155713865918]
Geo-localization refers to the process of determining where on earth some entity' is located. This paper provides a comprehensive survey of geo-localization involving images, which involves either determining from where an image has been captured (Image geo-localization) or geo-locating objects within an image (Object geo-localization) We will provide an in-depth study, including a summary of popular algorithms, a description of proposed datasets, and an analysis of performance results to illustrate the current state of each field.
arXiv Detail & Related papers (2021-12-30T20:46:53Z)
Hierarchical Attention Fusion for Geo-Localization [7.544917072241684]
We introduce a hierarchical attention fusion network using multi-scale features for geo-localization. We extract the hierarchical feature maps from a convolutional neural network (CNN) and organically fuse the extracted features for image representations. Our training is self-supervised using adaptive weights to control the attention of feature emphasis from each hierarchical level.
arXiv Detail & Related papers (2021-02-18T07:07:03Z)

This list is automatically generated from the titles and abstracts of the papers in this site.