Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models
- URL: http://arxiv.org/abs/2509.23827v1
- Date: Sun, 28 Sep 2025 12:04:54 GMT
- Title: Assessing Visual Privacy Risks in Multimodal AI: A Novel Taxonomy-Grounded Evaluation of Vision-Language Models
- Authors: Efthymios Tsaprazlis, Tiantian Feng, Anil Ramakrishna, Rahul Gupta, Shrikanth Narayanan,
- Abstract summary: We introduce a comprehensive, multi-level Visual Privacy taxonomy.<n>We evaluate the capabilities of several state-of-the-art Vision-Language Models.
- Score: 55.23884055923282
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Artificial Intelligence have profoundly transformed the technological landscape in recent years. Large Language Models (LLMs) have demonstrated impressive abilities in reasoning, text comprehension, contextual pattern recognition, and integrating language with visual understanding. While these advances offer significant benefits, they also reveal critical limitations in the models' ability to grasp the notion of privacy. There is hence substantial interest in determining if and how these models can understand and enforce privacy principles, particularly given the lack of supporting resources to test such a task. In this work, we address these challenges by examining how legal frameworks can inform the capabilities of these emerging technologies. To this end, we introduce a comprehensive, multi-level Visual Privacy Taxonomy that captures a wide range of privacy issues, designed to be scalable and adaptable to existing and future research needs. Furthermore, we evaluate the capabilities of several state-of-the-art Vision-Language Models (VLMs), revealing significant inconsistencies in their understanding of contextual privacy. Our work contributes both a foundational taxonomy for future research and a critical benchmark of current model limitations, demonstrating the urgent need for more robust, privacy-aware AI systems.
Related papers
- A Systematic Survey of Model Extraction Attacks and Defenses: State-of-the-Art and Perspectives [65.3369988566853]
Recent studies have demonstrated that adversaries can replicate a target model's functionality.<n>Model Extraction Attacks pose threats to intellectual property, privacy, and system security.<n>We propose a novel taxonomy that classifies MEAs according to attack mechanisms, defense approaches, and computing environments.
arXiv Detail & Related papers (2025-08-20T19:49:59Z) - Evaluating Multimodal Language Models as Visual Assistants for Visually Impaired Users [42.132487737233845]
This paper explores the effectiveness of Multimodal Large Language models (MLLMs) as assistive technologies for visually impaired individuals.<n>We conduct a user survey to identify adoption patterns and key challenges users face with such technologies.
arXiv Detail & Related papers (2025-03-28T16:54:25Z) - Retrieval Augmented Generation and Understanding in Vision: A Survey and New Outlook [85.43403500874889]
Retrieval-augmented generation (RAG) has emerged as a pivotal technique in artificial intelligence (AI)<n>Recent advancements in RAG for embodied AI, with a particular focus on applications in planning, task execution, multimodal perception, interaction, and specialized domains.
arXiv Detail & Related papers (2025-03-23T10:33:28Z) - Privacy-Preserving Large Language Models: Mechanisms, Applications, and Future Directions [0.0]
This survey explores the landscape of privacy-preserving mechanisms tailored for large language models.<n>We examine their efficacy in addressing key privacy challenges, such as membership inference and model inversion attacks.<n>By synthesizing state-of-the-art approaches and future trends, this paper provides a foundation for developing robust, privacy-preserving large language models.
arXiv Detail & Related papers (2024-12-09T00:24:09Z) - Is A Picture Worth A Thousand Words? Delving Into Spatial Reasoning for Vision Language Models [37.44286562901589]
We propose SpatialEval, a novel benchmark that covers diverse aspects of spatial reasoning.
We conduct a comprehensive evaluation of competitive language and vision-language models.
Our findings reveal several counter-intuitive insights that have been overlooked in the literature.
arXiv Detail & Related papers (2024-06-21T03:53:37Z) - Unique Security and Privacy Threats of Large Language Models: A Comprehensive Survey [63.4581186135101]
Large language models (LLMs) have made remarkable advancements in natural language processing.<n>Privacy and security issues have been revealed throughout their life cycle.<n>This survey outlines and analyzes potential countermeasures.
arXiv Detail & Related papers (2024-06-12T07:55:32Z) - Exploring the Privacy Protection Capabilities of Chinese Large Language Models [19.12726985060863]
We have devised a three-tiered progressive framework for evaluating privacy in language systems.
Our primary objective is to comprehensively evaluate the sensitivity of large language models to private information.
Our observations indicate that existing Chinese large language models universally show privacy protection shortcomings.
arXiv Detail & Related papers (2024-03-27T02:31:54Z) - On the Challenges and Opportunities in Generative AI [155.030542942979]
We argue that current large-scale generative AI models exhibit several fundamental shortcomings that hinder their widespread adoption across domains.<n>We aim to provide researchers with insights for exploring fruitful research directions, thus fostering the development of more robust and accessible generative AI solutions.
arXiv Detail & Related papers (2024-02-28T15:19:33Z) - Privacy in Foundation Models: A Conceptual Framework for System Design [3.438211531047665]
Foundation models present both significant challenges and incredible opportunities.<n>There is currently a lack of consensus regarding the comprehensive scope of both technical and non-technical issues that the privacy evaluation process should encompass.<n>This paper introduces a novel conceptual framework that integrates various responsible AI patterns from multiple perspectives.
arXiv Detail & Related papers (2023-11-13T00:44:06Z) - Foundational Models Defining a New Era in Vision: A Survey and Outlook [151.49434496615427]
Vision systems to see and reason about the compositional nature of visual scenes are fundamental to understanding our world.
The models learned to bridge the gap between such modalities coupled with large-scale training data facilitate contextual reasoning, generalization, and prompt capabilities at test time.
The output of such models can be modified through human-provided prompts without retraining, e.g., segmenting a particular object by providing a bounding box, having interactive dialogues by asking questions about an image or video scene or manipulating the robot's behavior through language instructions.
arXiv Detail & Related papers (2023-07-25T17:59:18Z) - OCRBench: On the Hidden Mystery of OCR in Large Multimodal Models [122.27878464009181]
We conducted a comprehensive evaluation of Large Multimodal Models, such as GPT4V and Gemini, in various text-related visual tasks.
OCRBench contains 29 datasets, making it the most comprehensive OCR evaluation benchmark available.
arXiv Detail & Related papers (2023-05-13T11:28:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.