Toward Reliable VLM: A Fine-Grained Benchmark and Framework for Exposure, Bias, and Inference in Korean Street Views
- URL: http://arxiv.org/abs/2506.03371v1
- Date: Tue, 03 Jun 2025 20:28:55 GMT
- Title: Toward Reliable VLM: A Fine-Grained Benchmark and Framework for Exposure, Bias, and Inference in Korean Street Views
- Authors: Xiaonan Wang, Bo Shao, Hansaem Kim,
- Abstract summary: We present KoreaGEO Bench, the first fine-grained, multimodal geolocation benchmark for Korean street views.<n>Our dataset comprises 1,080 high-resolution images sampled across four urban clusters and nine place types.<n>Results reveal modality-driven shifts in localization precision and highlight structural prediction biases toward core cities.
- Score: 3.611742324688716
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Recent advances in vision-language models (VLMs) have enabled accurate image-based geolocation, raising serious concerns about location privacy risks in everyday social media posts. However, current benchmarks remain coarse-grained, linguistically biased, and lack multimodal and privacy-aware evaluations. To address these gaps, we present KoreaGEO Bench, the first fine-grained, multimodal geolocation benchmark for Korean street views. Our dataset comprises 1,080 high-resolution images sampled across four urban clusters and nine place types, enriched with multi-contextual annotations and two styles of Korean captions simulating real-world privacy exposure. We introduce a three-path evaluation protocol to assess ten mainstream VLMs under varying input modalities and analyze their accuracy, spatial bias, and reasoning behavior. Results reveal modality-driven shifts in localization precision and highlight structural prediction biases toward core cities.
Related papers
- From Pixels to Places: A Systematic Benchmark for Evaluating Image Geolocalization Ability in Large Language Models [14.178064117544082]
Image geolocalization is important for applications in crisis response, digital forensics, and location-based intelligence.<n>Recent advances in large language models (LLMs) offer new opportunities for visual reasoning.<n>We introduce a benchmark called IMAGEO-Bench that systematically evaluates accuracy, distance error, geospatial bias, and reasoning process.
arXiv Detail & Related papers (2025-08-03T06:04:33Z) - VLM-Guided Visual Place Recognition for Planet-Scale Geo-Localization [24.433604332415204]
We propose a novel hybrid geo-localization framework that combines the strengths of vision-language models and visual place recognition.<n>We evaluate our approach on multiple geo-localization benchmarks and show that it consistently outperforms prior state-of-the-art methods.
arXiv Detail & Related papers (2025-07-23T12:23:03Z) - Towards Explainable Bilingual Multimodal Misinformation Detection and Localization [64.37162720126194]
BiMi is a framework that jointly performs region-level localization, cross-modal and cross-lingual consistency detection, and natural language explanation for misinformation analysis.<n>BiMiBench is a benchmark constructed by systematically editing real news images and subtitles.<n>BiMi outperforms strong baselines by up to +8.9 in classification accuracy, +15.9 in localization accuracy, and +2.5 in explanation BERTScore.
arXiv Detail & Related papers (2025-06-28T15:43:06Z) - Recognition through Reasoning: Reinforcing Image Geo-localization with Large Vision-Language Models [27.848962405476108]
New pipeline constructs reasoning-oriented geo-localization dataset, MP16-Reason, using diverse social media images.<n>We introduce GLOBE, Group-relative policy optimization for Locatability assessment and optimized visual-clue reasoning.<n>Results demonstrate that GLOBE outperforms state-of-the-art open-source LVLMs on geo-localization tasks.
arXiv Detail & Related papers (2025-06-17T16:07:58Z) - GeoRanker: Distance-Aware Ranking for Worldwide Image Geolocalization [30.983556433953076]
We propose GeoRanker, a distance-aware ranking framework for image geolocalization.<n>We introduce a multi-order distance loss that ranks both absolute and relative distances, enabling the model to reason over structured spatial relationships.<n>GeoRanker achieves state-of-the-art results on two well-established benchmarks.
arXiv Detail & Related papers (2025-05-19T21:04:46Z) - VLMs as GeoGuessr Masters: Exceptional Performance, Hidden Biases, and Privacy Risks [44.69950059419091]
We introduce a benchmark consisting of 1,200 images paired with detailed geographic metadata.<n>We find that while these models demonstrate the ability to recognize geographic information from images, they exhibit significant biases.<n>Specifically, performance is substantially higher for economically developed and densely populated regions.
arXiv Detail & Related papers (2025-02-16T15:28:34Z) - Around the World in 80 Timesteps: A Generative Approach to Global Visual Geolocation [19.028122299569052]
Global visual geolocation predicts where an image was captured on Earth.<n>In this paper, we aim to close the gap between traditional geolocalization and modern generative methods.<n>Our model achieves state-of-the-art performance on three visual geolocation benchmarks.
arXiv Detail & Related papers (2024-12-09T18:59:04Z) - GEOBench-VLM: Benchmarking Vision-Language Models for Geospatial Tasks [84.86699025256705]
We present GEOBench-VLM, a benchmark specifically designed to evaluate Vision-Language Models (VLMs) on geospatial tasks.<n>Our benchmark features over 10,000 manually verified instructions and spanning diverse visual conditions, object types, and scales.<n>We evaluate several state-of-the-art VLMs to assess performance on geospatial-specific challenges.
arXiv Detail & Related papers (2024-11-28T18:59:56Z) - CurriculumLoc: Enhancing Cross-Domain Geolocalization through
Multi-Stage Refinement [11.108860387261508]
Visual geolocalization is a cost-effective and scalable task that involves matching one or more query images taken at some unknown location, to a set of geo-tagged reference images.
We develop CurriculumLoc, a novel keypoint detection and description with global semantic awareness and a local geometric verification.
We achieve new high recall@1 scores of 62.6% and 94.5% on ALTO, with two different distances metrics, respectively.
arXiv Detail & Related papers (2023-11-20T08:40:01Z) - Global-Local Context Network for Person Search [125.51080862575326]
Person search aims to jointly localize and identify a query person from natural, uncropped images.
We exploit rich context information globally and locally surrounding the target person, which we refer to scene and group context, respectively.
We propose a unified global-local context network (GLCNet) with the intuitive aim of feature enhancement.
arXiv Detail & Related papers (2021-12-05T07:38:53Z) - PGL: Prior-Guided Local Self-supervised Learning for 3D Medical Image
Segmentation [87.50205728818601]
We propose a PriorGuided Local (PGL) self-supervised model that learns the region-wise local consistency in the latent feature space.
Our PGL model learns the distinctive representations of local regions, and hence is able to retain structural information.
arXiv Detail & Related papers (2020-11-25T11:03:11Z) - Each Part Matters: Local Patterns Facilitate Cross-view Geo-localization [54.00111565818903]
Cross-view geo-localization is to spot images of the same geographic target from different platforms.
Existing methods usually concentrate on mining the fine-grained feature of the geographic target in the image center.
We introduce a simple and effective deep neural network, called Local Pattern Network (LPN), to take advantage of contextual information.
arXiv Detail & Related papers (2020-08-26T16:06:11Z) - Predicting Livelihood Indicators from Community-Generated Street-Level
Imagery [70.5081240396352]
We propose an inexpensive, scalable, and interpretable approach to predict key livelihood indicators from public crowd-sourced street-level imagery.
By comparing our results against ground data collected in nationally-representative household surveys, we demonstrate the performance of our approach in accurately predicting indicators of poverty, population, and health.
arXiv Detail & Related papers (2020-06-15T18:12:12Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.