Related papers: PII-VisBench: Evaluating Personally Identifiable Information Safety in Vision Language Models Along a Continuum of Visibility

PII-VisBench: Evaluating Personally Identifiable Information Safety in Vision Language Models Along a Continuum of Visibility

URL: http://arxiv.org/abs/2601.05739v1
Date: Fri, 09 Jan 2026 11:40:56 GMT
Title: PII-VisBench: Evaluating Personally Identifiable Information Safety in Vision Language Models Along a Continuum of Visibility
Authors: G M Shahariar, Zabir Al Nazi, Md Olid Hasan Bhuiyan, Zhouxing Shi,
Abstract summary: PII-VisBench is a novel benchmark containing 4000 probes designed to evaluate VLM safety through the continuum of online presence.<n>The benchmark stratifies 200 subjects into four visibility categories: high, medium, low, and zero--based on the extent and nature of their information available online.<n>Across models, we observe a consistent pattern: refusals increase and PII disclosures decrease (9.10% high to 5.34% low) as subject visibility drops.
Score: 4.603440637344069
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: Vision Language Models (VLMs) are increasingly integrated into privacy-critical domains, yet existing evaluations of personally identifiable information (PII) leakage largely treat privacy as a static extraction task and ignore how a subject's online presence--the volume of their data available online--influences privacy alignment. We introduce PII-VisBench, a novel benchmark containing 4000 unique probes designed to evaluate VLM safety through the continuum of online presence. The benchmark stratifies 200 subjects into four visibility categories: high, medium, low, and zero--based on the extent and nature of their information available online. We evaluate 18 open-source VLMs (0.3B-32B) based on two key metrics: percentage of PII probing queries refused (Refusal Rate) and the fraction of non-refusal responses flagged for containing PII (Conditional PII Disclosure Rate). Across models, we observe a consistent pattern: refusals increase and PII disclosures decrease (9.10% high to 5.34% low) as subject visibility drops. We identify that models are more likely to disclose PII for high-visibility subjects, alongside substantial model-family heterogeneity and PII-type disparities. Finally, paraphrasing and jailbreak-style prompts expose attack and model-dependent failures, motivating visibility-aware safety evaluation and training interventions.

Related papers

Unintended Memorization of Sensitive Information in Fine-Tuned Language Models [24.228889351240838]
Fine-tuning Large Language Models (LLMs) on sensitive datasets carries a substantial risk of unintended memorization and leakage of Personally Identifiable Information (PII)<n>We design controlled extraction probes to quantify unintended PII memorization and study how factors such as language, PII frequency, task type, and model size influence memorization behavior.
arXiv Detail & Related papers (2026-01-24T15:08:45Z)
UnPII: Unlearning Personally Identifiable Information with Quantifiable Exposure Risk [1.7825339856352196]
UnPII is the first PII-centric unlearning approach that prioritizes forgetting based on the risk of individual or combined PII attributes.<n>To support realistic assessment, we systematically construct a synthetic PII dataset that simulates realistic exposure scenarios.
arXiv Detail & Related papers (2026-01-05T04:45:04Z)
MultiPriv: Benchmarking Individual-Level Privacy Reasoning in Vision-Language Models [14.942122955210436]
Modern Vision-Language Models (VLMs) demonstrate sophisticated reasoning, escalating privacy risks.<n>Current privacy benchmarks are structurally insufficient for this new threat.<n>We propose textbfMultiPriv, the first benchmark designed to systematically evaluate individual-level privacy reasoning.
arXiv Detail & Related papers (2025-11-21T04:33:11Z)
On the MIA Vulnerability Gap Between Private GANs and Diffusion Models [51.53790101362898]
Generative Adversarial Networks (GANs) and diffusion models have emerged as leading approaches for high-quality image synthesis.<n>We present the first unified theoretical and empirical analysis of the privacy risks faced by differentially private generative models.
arXiv Detail & Related papers (2025-09-03T14:18:22Z)
REVAL: A Comprehension Evaluation on Reliability and Values of Large Vision-Language Models [59.445672459851274]
REVAL is a comprehensive benchmark designed to evaluate the textbfREliability and textbfVALue of Large Vision-Language Models.<n>REVAL encompasses over 144K image-text Visual Question Answering (VQA) samples, structured into two primary sections: Reliability and Values.<n>We evaluate 26 models, including mainstream open-source LVLMs and prominent closed-source models like GPT-4o and Gemini-1.5-Pro.
arXiv Detail & Related papers (2025-03-20T07:54:35Z)
PrivacyScalpel: Enhancing LLM Privacy via Interpretable Feature Intervention with Sparse Autoencoders [8.483679748399037]
Large Language Models (LLMs) have demonstrated remarkable capabilities in natural language processing but pose privacy risks by memorizing and leaking Personally Identifiable Information (PII)<n>Existing mitigation strategies, such as differential privacy and neuron-level interventions, often degrade model utility or fail to effectively prevent leakage.<n>We introduce PrivacyScalpel, a novel privacy-preserving framework that leverages interpretability techniques to identify and mitigate PII leakage while maintaining performance.
arXiv Detail & Related papers (2025-03-14T09:31:01Z)
PII-Bench: Evaluating Query-Aware Privacy Protection Systems [10.52362814808073]
We propose a query-unrelated PII masking strategy and introduce PII-Bench, the first comprehensive evaluation framework for assessing privacy protection systems.<n>PII-Bench comprises 2,842 test samples across 55 fine-grained PII categories, featuring diverse scenarios from single-subject descriptions to complex multi-party interactions.
arXiv Detail & Related papers (2025-02-25T14:49:08Z)
Multi-P$^2$A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models [65.2761254581209]
We evaluate the privacy preservation capabilities of 21 open-source and 2 closed-source Large Vision-Language Models (LVLMs)<n>Based on Multi-P$2$A, we evaluate the privacy preservation capabilities of 21 open-source and 2 closed-source LVLMs.<n>Our results reveal that current LVLMs generally pose a high risk of facilitating privacy breaches.
arXiv Detail & Related papers (2024-12-27T07:33:39Z)
Benchmarking Vision Language Model Unlearning via Fictitious Facial Identity Dataset [92.99416966226724]
We introduce Facial Identity Unlearning Benchmark (FIUBench), a novel VLM unlearning benchmark designed to robustly evaluate the effectiveness of unlearning algorithms.<n>We apply a two-stage evaluation pipeline that is designed to precisely control the sources of information and their exposure levels.<n>Through the evaluation of four baseline VLM unlearning algorithms within FIUBench, we find that all methods remain limited in their unlearning performance.
arXiv Detail & Related papers (2024-11-05T23:26:10Z)
VHELM: A Holistic Evaluation of Vision Language Models [75.88987277686914]
We present the Holistic Evaluation of Vision Language Models (VHELM) VHELM aggregates various datasets to cover one or more of the 9 aspects: visual perception, knowledge, reasoning, bias, fairness, multilinguality, robustness, toxicity, and safety. Our framework is designed to be lightweight and automatic so that evaluation runs are cheap and fast.
arXiv Detail & Related papers (2024-10-09T17:46:34Z)
DePrompt: Desensitization and Evaluation of Personal Identifiable Information in Large Language Model Prompts [11.883785681042593]
DePrompt is a desensitization protection and effectiveness evaluation framework for prompt. We integrate contextual attributes to define privacy types, achieving high-precision PII entity identification. Our framework is adaptable to prompts and can be extended to text usability-dependent scenarios.
arXiv Detail & Related papers (2024-08-16T02:38:25Z)
VALOR-EVAL: Holistic Coverage and Faithfulness Evaluation of Large Vision-Language Models [57.43276586087863]
Large Vision-Language Models (LVLMs) suffer from hallucination issues, wherein the models generate plausible-sounding but factually incorrect outputs. Existing benchmarks are often limited in scope, focusing mainly on object hallucinations. We introduce a multi-dimensional benchmark covering objects, attributes, and relations, with challenging images selected based on associative biases.
arXiv Detail & Related papers (2024-04-22T04:49:22Z)

This list is automatically generated from the titles and abstracts of the papers in this site.