Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models
- URL: http://arxiv.org/abs/2508.19967v1
- Date: Wed, 27 Aug 2025 15:21:31 GMT
- Title: Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models
- Authors: Oliver Grainge, Sania Waheed, Jack Stilgoe, Michael Milford, Shoaib Ehsan,
- Abstract summary: Geo-localization is the task of identifying the location of an image using visual cues alone.<n>Vision-Language Models (VLMs) are increasingly demonstrating capabilities as accurate image geo-locators.<n>This brings significant privacy risks, including those related to stalking and surveillance.
- Score: 11.444835352261002
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Geo-localization is the task of identifying the location of an image using visual cues alone. It has beneficial applications, such as improving disaster response, enhancing navigation, and geography education. Recently, Vision-Language Models (VLMs) are increasingly demonstrating capabilities as accurate image geo-locators. This brings significant privacy risks, including those related to stalking and surveillance, considering the widespread uses of AI models and sharing of photos on social media. The precision of these models is likely to improve in the future. Despite these risks, there is little work on systematically evaluating the geolocation precision of Generative VLMs, their limits and potential for unintended inferences. To bridge this gap, we conduct a comprehensive assessment of the geolocation capabilities of 25 state-of-the-art VLMs on four benchmark image datasets captured in diverse environments. Our results offer insight into the internal reasoning of VLMs and highlight their strengths, limitations, and potential societal risks. Our findings indicate that current VLMs perform poorly on generic street-level images yet achieve notably high accuracy (61\%) on images resembling social media content, raising significant and urgent privacy concerns.
Related papers
- Do Vision-Language Models Respect Contextual Integrity in Location Disclosure? [35.91273000038155]
Vision-language models (VLMs) have demonstrated strong performance in image geolocation.<n>This poses a significant privacy risk as they can be exploited to infer sensitive locations from casually shared photos.<n>We introduce VLM-GEOPRIVACY, a benchmark that challenges VLMs to interpret latent social norms and contextual cues in real-world images.
arXiv Detail & Related papers (2026-02-04T20:24:14Z) - Where on Earth? A Vision-Language Benchmark for Probing Model Geolocation Skills Across Scales [61.03549470159347]
Vision-language models (VLMs) have advanced rapidly, yet their capacity for image-grounded geolocation in open-world conditions has not been comprehensively evaluated.<n>We present EarthWhere, a comprehensive benchmark for VLM image geolocation that evaluates visual recognition, step-by-step reasoning, and evidence use.
arXiv Detail & Related papers (2025-10-13T01:12:21Z) - GeoShield: Safeguarding Geolocation Privacy from Vision-Language Models via Adversarial Perturbations [48.78781663571235]
Vision-Language Models (VLMs) can infer users' locations from public shared images, posing a substantial risk to geoprivacy.<n>We propose GeoShield, a novel adversarial framework designed for robust geoprivacy protection in real-world scenarios.
arXiv Detail & Related papers (2025-08-05T08:37:06Z) - Evaluation of Geolocation Capabilities of Multimodal Large Language Models and Analysis of Associated Privacy Risks [9.003350058345442]
MLLMs are capable of inferring the geographic location of images based solely on visual content.<n>This poses serious risks of privacy invasion, including doxxing, surveillance, and other security threats.<n>The most advanced visual models can successfully localize the origin of street-level imagery with up to $49%$ accuracy within a 1-kilometer radius.
arXiv Detail & Related papers (2025-06-30T03:05:30Z) - Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models [47.98900725310249]
New pipeline constructs a reasoning-oriented geo-localization dataset, MP16-Reason, using diverse social media images.<n>GLOBE incorporates task-specific rewards that jointly enhance localizability assessment, visual-cue reasoning, and geolocation accuracy.<n>Results demonstrate that GLOBE outperforms state-of-the-art open-source LVLMs on geo-localization tasks.
arXiv Detail & Related papers (2025-06-17T16:07:58Z) - Evaluating Precise Geolocation Inference Capabilities of Vision Language Models [0.0]
This paper introduces a benchmark dataset collected from Google Street View that represents its global distribution of coverage.<n>Foundation models are evaluated on single-image geolocation inference, with many achieving median distance errors of 300 km.<n>We further evaluate VLM "agents" with access to supplemental tools, observing up to a 30.6% decrease in distance error.
arXiv Detail & Related papers (2025-02-20T09:59:28Z) - GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks [84.86699025256705]
We present GEOBench-VLM, a benchmark specifically designed to evaluate Vision-Language Models (VLMs) on geospatial tasks.<n>Our benchmark features over 10,000 manually verified instructions and spanning diverse visual conditions, object types, and scales.<n>We evaluate several state-of-the-art VLMs to assess performance on geospatial-specific challenges.
arXiv Detail & Related papers (2024-11-28T18:59:56Z) - Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework.
By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information.
Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z) - Image-Based Geolocation Using Large Vision-Language Models [19.071551941682063]
We introduce tool, an innovative framework that significantly enhances image-based geolocation accuracy.
tool employs a systematic chain-of-thought (CoT) approach, mimicking human geoguessing strategies.
It achieves an impressive average score of 4550.5 in the GeoGuessr game, with an 85.37% win rate, and delivers highly precise geolocation predictions.
arXiv Detail & Related papers (2024-08-18T13:39:43Z) - Granular Privacy Control for Geolocation with Vision Language Models [36.3455665044992]
We develop a new benchmark, GPTGeoChat, to test the ability of Vision Language Models to moderate geolocation dialogues with users.
We collect a set of 1,000 image geolocation conversations between in-house annotators and GPT-4v.
We evaluate the ability of various VLMs to moderate GPT-4v geolocation conversations by determining when too much location information has been revealed.
arXiv Detail & Related papers (2024-07-06T04:06:55Z) - Unique Security and Privacy Threats of Large Language Models: A Comprehensive Survey [63.4581186135101]
Large language models (LLMs) have made remarkable advancements in natural language processing.<n>Privacy and security issues have been revealed throughout their life cycle.<n>This survey outlines and analyzes potential countermeasures.
arXiv Detail & Related papers (2024-06-12T07:55:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.