Related papers: Granular Privacy Control for Geolocation with Vision Language Models

Granular Privacy Control for Geolocation with Vision Language Models

URL: http://arxiv.org/abs/2407.04952v2
Date: Thu, 17 Oct 2024 14:58:53 GMT
Title: Granular Privacy Control for Geolocation with Vision Language Models
Authors: Ethan Mendes, Yang Chen, James Hays, Sauvik Das, Wei Xu, Alan Ritter,
Abstract summary: We develop a new benchmark, GPTGeoChat, to test the ability of Vision Language Models to moderate geolocation dialogues with users. We collect a set of 1,000 image geolocation conversations between in-house annotators and GPT-4v. We evaluate the ability of various VLMs to moderate GPT-4v geolocation conversations by determining when too much location information has been revealed.
Score: 36.3455665044992
License: http://creativecommons.org/licenses/by-nc-sa/4.0/
Abstract: Vision Language Models (VLMs) are rapidly advancing in their capability to answer information-seeking questions. As these models are widely deployed in consumer applications, they could lead to new privacy risks due to emergent abilities to identify people in photos, geolocate images, etc. As we demonstrate, somewhat surprisingly, current open-source and proprietary VLMs are very capable image geolocators, making widespread geolocation with VLMs an immediate privacy risk, rather than merely a theoretical future concern. As a first step to address this challenge, we develop a new benchmark, GPTGeoChat, to test the ability of VLMs to moderate geolocation dialogues with users. We collect a set of 1,000 image geolocation conversations between in-house annotators and GPT-4v, which are annotated with the granularity of location information revealed at each turn. Using this new dataset, we evaluate the ability of various VLMs to moderate GPT-4v geolocation conversations by determining when too much location information has been revealed. We find that custom fine-tuned models perform on par with prompted API-based models when identifying leaked location information at the country or city level; however, fine-tuning on supervised data appears to be needed to accurately moderate finer granularities, such as the name of a restaurant or building.

Related papers

GAEA: A Geolocation Aware Conversational Model [41.622284747213925]
We introduce a conversational model GAEA that can provide information regarding the location of an image, as required by a user. GAEA-1.6M is a comprehensive dataset with 800K images and around 1.6M question-answer pairs constructed by leveraging OpenStreetMap (OSM) attributes and geographical context clues. We consider 11 state-of-the-art open-source and proprietary LMMs and demonstrate that GAEA significantly outperforms the best open-source model, LLaVA-OneVision by 25.69% and the best proprietary model, GPT-4o by 8.28%.
arXiv Detail & Related papers (2025-03-20T17:59:47Z)
OmniGeo: Towards a Multimodal Large Language Models for Geospatial Artificial Intelligence [51.0456395687016]
multimodal large language models (LLMs) have opened new frontiers in artificial intelligence. We propose a MLLM (OmniGeo) tailored to geospatial applications. By combining the strengths of natural language understanding and spatial reasoning, our model enhances the ability of instruction following and the accuracy of GeoAI systems.
arXiv Detail & Related papers (2025-03-20T16:45:48Z)
Evaluating Precise Geolocation Inference Capabilities of Vision Language Models [0.0]
This paper introduces a benchmark dataset collected from Google Street View that represents its global distribution of coverage. Foundation models are evaluated on single-image geolocation inference, with many achieving median distance errors of 300 km. We further evaluate VLM "agents" with access to supplemental tools, observing up to a 30.6% decrease in distance error.
arXiv Detail & Related papers (2025-02-20T09:59:28Z)
Image-based Geo-localization for Robotics: Are Black-box Vision-Language Models there yet? [25.419763184667985]
Vision-Language models (VLMs) offer exciting opportunities for robotic applications involving image geo-localization. Recent research works have focused on using a VLM as embeddings extractor for geo-localization. This paper investigates the potential of some of the state-of-the-art VLMs as stand-alone, zero-shot geo-localization systems.
arXiv Detail & Related papers (2025-01-28T13:46:01Z)
GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks [84.86699025256705]
We present GEOBench-VLM, a benchmark specifically designed to evaluate Vision-Language Models (VLMs) on geospatial tasks. Our benchmark features over 10,000 manually verified instructions and spanning diverse visual conditions, object types, and scales. We evaluate several state-of-the-art VLMs to assess performance on geospatial-specific challenges.
arXiv Detail & Related papers (2024-11-28T18:59:56Z)
Swarm Intelligence in Geo-Localization: A Multi-Agent Large Vision-Language Model Collaborative Framework [51.26566634946208]
We introduce smileGeo, a novel visual geo-localization framework. By inter-agent communication, smileGeo integrates the inherent knowledge of these agents with additional retrieved information. Results show that our approach significantly outperforms current state-of-the-art methods.
arXiv Detail & Related papers (2024-08-21T03:31:30Z)
Image-Based Geolocation Using Large Vision-Language Models [19.071551941682063]
We introduce tool, an innovative framework that significantly enhances image-based geolocation accuracy. tool employs a systematic chain-of-thought (CoT) approach, mimicking human geoguessing strategies. It achieves an impressive average score of 4550.5 in the GeoGuessr game, with an 85.37% win rate, and delivers highly precise geolocation predictions.
arXiv Detail & Related papers (2024-08-18T13:39:43Z)
Towards Vision-Language Geo-Foundation Model: A Survey [65.70547895998541]
Vision-Language Foundation Models (VLFMs) have made remarkable progress on various multimodal tasks. This paper thoroughly reviews VLGFMs, summarizing and analyzing recent developments in the field.
arXiv Detail & Related papers (2024-06-13T17:57:30Z)
GeoReasoner: Geo-localization with Reasoning in Street Views using a Large Vision-Language Model [6.135404769437841]
This work tackles the problem of geo-localization with a new paradigm using a large vision-language model (LVLM) Existing street-view datasets often contain numerous low-quality images lacking visual clues, and lack any reasoning inference. To address the data-quality issue, we devise a CLIP-based network to quantify the degree of street-view images being locatable. To enhance reasoning inference, we integrate external knowledge obtained from real geo-localization games, tapping into valuable human inference capabilities.
arXiv Detail & Related papers (2024-06-03T18:08:56Z)
PIN: Positional Insert Unlocks Object Localisation Abilities in VLMs [55.8550939439138]
Vision-Language Models (VLMs) have shown immense potential by integrating large language models with vision systems. These models face challenges in the fundamental computer vision task of object localisation, due to their training on multimodal data containing mostly captions. We introduce an input-agnostic Positional Insert (PIN), a learnable spatial prompt, containing a minimal set of parameters that are slid inside the frozen VLM. Our PIN module is trained with a simple next-token prediction task on synthetic data without requiring the introduction of new output heads.
arXiv Detail & Related papers (2024-02-13T18:39:18Z)
Good at captioning, bad at counting: Benchmarking GPT-4V on Earth observation data [7.797577465015058]
We propose a benchmark to gauge the progress of Large Vision-Language Models (VLMs) toward being useful tools for Earth observation data. Motivated by real-world applications, our benchmark includes scenarios like urban monitoring, disaster relief, land use, and conservation. Our benchmark will be made publicly available at https://vleo.danielz.ch/ and on Hugging Face at https://huggingface.co/collections/mit-ei/vleo-benchmark-datasets-65b789b0466555489cce0d70.
arXiv Detail & Related papers (2024-01-31T04:57:12Z)
GeoLocator: a location-integrated large multimodal model for inferring geo-privacy [6.7452045691798945]
This study develops a location-integrated GPT-4 based model named GeoLocator. Experiments reveal that GeoLocator generates specific geographic details with high accuracy. We conclude with the broader implications of GeoLocator and our findings for individuals and the community at large.
arXiv Detail & Related papers (2023-11-21T21:48:51Z)
GeoLLM: Extracting Geospatial Knowledge from Large Language Models [49.20315582673223]
We present GeoLLM, a novel method that can effectively extract geospatial knowledge from large language models. We demonstrate the utility of our approach across multiple tasks of central interest to the international community, including the measurement of population density and economic livelihoods. Our experiments reveal that LLMs are remarkably sample-efficient, rich in geospatial information, and robust across the globe.
arXiv Detail & Related papers (2023-10-10T00:03:23Z)
GeoCLIP: Clip-Inspired Alignment between Locations and Images for Effective Worldwide Geo-localization [61.10806364001535]
Worldwide Geo-localization aims to pinpoint the precise location of images taken anywhere on Earth. Existing approaches divide the globe into discrete geographic cells, transforming the problem into a classification task. We propose GeoCLIP, a novel CLIP-inspired Image-to-GPS retrieval approach that enforces alignment between the image and its corresponding GPS locations.
arXiv Detail & Related papers (2023-09-27T20:54:56Z)

This list is automatically generated from the titles and abstracts of the papers in this site.