Related papers: GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents

GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents

URL: http://arxiv.org/abs/2511.22441v1
Date: Thu, 27 Nov 2025 13:27:26 GMT
Title: GEO-Detective: Unveiling Location Privacy Risks in Images with LLM Agents
Authors: Xinyu Zhang, Yixin Wu, Boyang Zhang, Chenhao Lin, Chao Shen, Michael Backes, Yang Zhang,
Abstract summary: We present Geo-Detective, an agent that mimics human reasoning and tool use for image geolocation inference.<n>It follows a procedure with four steps that adaptively selects strategies based on image difficulty.<n>It is equipped with specialized tools such as visual reverse search, which emulates how humans gather external geographic clues.
Score: 40.59860671244798
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Images shared on social media often expose geographic cues. While early geolocation methods required expert effort and lacked generalization, the rise of Large Vision Language Models (LVLMs) now enables accurate geolocation even for ordinary users. However, existing approaches are not optimized for this task. To explore the full potential and associated privacy risks, we present Geo-Detective, an agent that mimics human reasoning and tool use for image geolocation inference. It follows a procedure with four steps that adaptively selects strategies based on image difficulty and is equipped with specialized tools such as visual reverse search, which emulates how humans gather external geographic clues. Experimental results show that GEO-Detective outperforms baseline large vision language models (LVLMs) overall, particularly on images lacking visible geographic features. In country level geolocation tasks, it achieves an improvement of over 11.1% compared to baseline LLMs, and even at finer grained levels, it still provides around a 5.2% performance gain. Meanwhile, when equipped with external clues, GEO-Detective becomes more likely to produce accurate predictions, reducing the "unknown" prediction rate by more than 50.6%. We further explore multiple defense strategies and find that Geo-Detective exhibits stronger robustness, highlighting the need for more effective privacy safeguards.

Related papers

GeoAgent: Learning to Geolocate Everywhere with Reinforced Geographic Characteristics [91.17301794848025]
This paper presents GeoAgent, a model capable of reasoning closely with humans and deriving fine-grained address conclusions.<n>Previous RL-based methods have achieved breakthroughs in performance and interpretability but still remain concerns because of their reliance on AI-generated chain-of-thought (CoT) data and training strategies.
arXiv Detail & Related papers (2026-02-13T04:48:05Z)
GeoVista: Web-Augmented Agentic Visual Reasoning for Geolocalization [53.080882980294795]
Current research on agentic visual reasoning enables deep multimodal understanding but primarily focuses on image manipulation tools.<n>In this work, we revisit the geolocalization task, which requires not only nuanced visual grounding but also web search to confirm or refine hypotheses.<n>Since existing geolocalization benchmarks fail to meet the need for high-resolution imagery and the localization challenge for deep agentic reasoning, we curate GeoBench.<n>We propose GeoVista, an agentic model that seamlessly integrates tool invocation within the reasoning loop, including an image-zoom-in tool to magnify regions of interest and a web-search tool to retrieve related
arXiv Detail & Related papers (2025-11-19T18:59:22Z)
Assessing the Geolocation Capabilities, Limitations and Societal Risks of Generative Vision-Language Models [11.444835352261002]
Geo-localization is the task of identifying the location of an image using visual cues alone.<n>Vision-Language Models (VLMs) are increasingly demonstrating capabilities as accurate image geo-locators.<n>This brings significant privacy risks, including those related to stalking and surveillance.
arXiv Detail & Related papers (2025-08-27T15:21:31Z)
VLM-Guided Visual Place Recognition for Planet-Scale Geo-Localization [24.433604332415204]
We propose a novel hybrid geo-localization framework that combines the strengths of vision-language models and visual place recognition.<n>We evaluate our approach on multiple geo-localization benchmarks and show that it consistently outperforms prior state-of-the-art methods.
arXiv Detail & Related papers (2025-07-23T12:23:03Z)
GaGA: Towards Interactive Global Geolocation Assistant [20.342366228855735]
GaGA is an interactive global geolocation assistant built upon the flourishing large vision-language models (LVLMs)<n>It uncovers geographical clues within images and combines them with the extensive world knowledge embedded in LVLMs to determine the geolocations.<n>GaGA achieves state-of-the-art performance on the GWS15k dataset, improving accuracy by 4.57% at the country level and 2.92% at the city level.
arXiv Detail & Related papers (2024-12-12T03:39:44Z)
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework. By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information. Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z)
Image-Based Geolocation Using Large Vision-Language Models [19.071551941682063]
We introduce tool, an innovative framework that significantly enhances image-based geolocation accuracy. tool employs a systematic chain-of-thought (CoT) approach, mimicking human geoguessing strategies. It achieves an impressive average score of 4550.5 in the GeoGuessr game, with an 85.37% win rate, and delivers highly precise geolocation predictions.
arXiv Detail & Related papers (2024-08-18T13:39:43Z)
GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models. We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods. Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z)
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.