The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework
- URL: http://arxiv.org/abs/2505.19139v1
- Date: Sun, 25 May 2025 13:22:10 GMT
- Title: The Eye of Sherlock Holmes: Uncovering User Private Attribute Profiling via Vision-Language Model Agentic Framework
- Authors: Feiran Liu, Yuzhe Zhang, Xinyi Huang, Yinan Peng, Xinfeng Li, Lixu Wang, Yutong Shen, Ranjie Duan, Simeng Qin, Xiaojun Jia, Qingsong Wen, Wei Dong,
- Abstract summary: A new privacy risk is associated with the ability to infer sensitive attributes from personal images.<n>This threat is particularly severe given that modern apps can easily access users' photo albums.<n>In this work, we construct PAPI, the largest dataset for studying private attribute profiling in personal images.<n>We also propose HolmesEye, a hybrid agentic framework that combines VLMs and LLMs to enhance privacy inference.
- Score: 28.25933078258213
- License: http://creativecommons.org/publicdomain/zero/1.0/
- Abstract: Our research reveals a new privacy risk associated with the vision-language model (VLM) agentic framework: the ability to infer sensitive attributes (e.g., age and health information) and even abstract ones (e.g., personality and social traits) from a set of personal images, which we term "image private attribute profiling." This threat is particularly severe given that modern apps can easily access users' photo albums, and inference from image sets enables models to exploit inter-image relations for more sophisticated profiling. However, two main challenges hinder our understanding of how well VLMs can profile an individual from a few personal photos: (1) the lack of benchmark datasets with multi-image annotations for private attributes, and (2) the limited ability of current multimodal large language models (MLLMs) to infer abstract attributes from large image collections. In this work, we construct PAPI, the largest dataset for studying private attribute profiling in personal images, comprising 2,510 images from 251 individuals with 3,012 annotated privacy attributes. We also propose HolmesEye, a hybrid agentic framework that combines VLMs and LLMs to enhance privacy inference. HolmesEye uses VLMs to extract both intra-image and inter-image information and LLMs to guide the inference process as well as consolidate the results through forensic analysis, overcoming existing limitations in long-context visual reasoning. Experiments reveal that HolmesEye achieves a 10.8% improvement in average accuracy over state-of-the-art baselines and surpasses human-level performance by 15.0% in predicting abstract attributes. This work highlights the urgency of addressing privacy risks in image-based profiling and offers both a new dataset and an advanced framework to guide future research in this area.
Related papers
- The Man Behind the Sound: Demystifying Audio Private Attribute Profiling via Multimodal Large Language Model Agents [21.736748922886555]
This research uncovers a novel privacy risk associated with multimodal large language models (MLLMs)<n>The ability to infer sensitive personal attributes from audio data -- a technique we term audio private attribute profiling -- poses a significant threat.<n>We propose Gifts, a hybrid multi-agent framework that leverages the complementary strengths of audio-language models (ALMs) and large language models (LLMs) to enhance inference capabilities.
arXiv Detail & Related papers (2025-07-14T07:51:56Z) - MAGPIE: A dataset for Multi-AGent contextual PrIvacy Evaluation [54.410825977390274]
Existing benchmarks to evaluate contextual privacy in LLM-agents primarily assess single-turn, low-complexity tasks.<n>We first present a benchmark - MAGPIE comprising 158 real-life high-stakes scenarios across 15 domains.<n>We then evaluate the current state-of-the-art LLMs on their understanding of contextually private data and their ability to collaborate without violating user privacy.
arXiv Detail & Related papers (2025-06-25T18:04:25Z) - Multi-P$^2$A: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models [65.2761254581209]
We evaluate the privacy preservation capabilities of 21 open-source and 2 closed-source Large Vision-Language Models (LVLMs)<n>Based on Multi-P$2$A, we evaluate the privacy preservation capabilities of 21 open-source and 2 closed-source LVLMs.<n>Our results reveal that current LVLMs generally pose a high risk of facilitating privacy breaches.
arXiv Detail & Related papers (2024-12-27T07:33:39Z) - Enhancing User-Centric Privacy Protection: An Interactive Framework through Diffusion Models and Machine Unlearning [54.30994558765057]
The study pioneers a comprehensive privacy protection framework that safeguards image data privacy concurrently during data sharing and model publication.
We propose an interactive image privacy protection framework that utilizes generative machine learning models to modify image information at the attribute level.
Within this framework, we instantiate two modules: a differential privacy diffusion model for protecting attribute information in images and a feature unlearning algorithm for efficient updates of the trained model on the revised image dataset.
arXiv Detail & Related papers (2024-09-05T07:55:55Z) - Private Attribute Inference from Images with Vision-Language Models [2.9373912230684565]
Vision-language models (VLMs) are capable of understanding both images and text.
We evaluate 7 state-of-the-art VLMs, finding that they can infer various personal attributes at up to 77.6% accuracy.
We observe that accuracy scales with the general capabilities of the models, implying that future models can be misused as stronger inferential adversaries.
arXiv Detail & Related papers (2024-04-16T14:42:49Z) - Diff-Privacy: Diffusion-based Face Privacy Protection [58.1021066224765]
In this paper, we propose a novel face privacy protection method based on diffusion models, dubbed Diff-Privacy.
Specifically, we train our proposed multi-scale image inversion module (MSI) to obtain a set of SDM format conditional embeddings of the original image.
Based on the conditional embeddings, we design corresponding embedding scheduling strategies and construct different energy functions during the denoising process to achieve anonymization and visual identity information hiding.
arXiv Detail & Related papers (2023-09-11T09:26:07Z) - Content-based Graph Privacy Advisor [38.733077459065704]
We present an image privacy classifier that uses scene information and object cardinality as cues for the prediction of image privacy.
Our Graph Privacy Advisor (GPA) model simplifies a state-of-the-art graph model and improves its performance.
arXiv Detail & Related papers (2022-10-20T11:12:42Z) - OPOM: Customized Invisible Cloak towards Face Privacy Protection [58.07786010689529]
We investigate the face privacy protection from a technology standpoint based on a new type of customized cloak.
We propose a new method, named one person one mask (OPOM), to generate person-specific (class-wise) universal masks.
The effectiveness of the proposed method is evaluated on both common and celebrity datasets.
arXiv Detail & Related papers (2022-05-24T11:29:37Z) - InfoScrub: Towards Attribute Privacy by Targeted Obfuscation [77.49428268918703]
We study techniques that allow individuals to limit the private information leaked in visual data.
We tackle this problem in a novel image obfuscation framework.
We find our approach generates obfuscated images faithful to the original input images, and additionally increase uncertainty by 6.2$times$ (or up to 0.85 bits) over the non-obfuscated counterparts.
arXiv Detail & Related papers (2020-05-20T19:48:04Z) - PrivacyNet: Semi-Adversarial Networks for Multi-attribute Face Privacy [15.301150389512744]
We develop a technique for soft biometric privacy to face images via an image methodology.
The image perturbation is undertaken using a GAN-based Semi-Adversarial Network (SAN)
PrivacyNet allows a person to choose attributes that have to be obfuscated in the input face images.
arXiv Detail & Related papers (2020-01-02T18:53:31Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.