Related papers: VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models

VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models

URL: http://arxiv.org/abs/2507.08982v1
Date: Fri, 11 Jul 2025 19:34:01 GMT
Title: VIP: Visual Information Protection through Adversarial Attacks on Vision-Language Models
Authors: Hanene F. Z. Brachemi Meftah, Wassim Hamidouche, Sid Ahmed Fezza, Olivier Déforges,
Abstract summary: We frame the preservation of privacy in Vision-Language Models as an adversarial attack problem.<n>We propose a novel attack strategy that selectively conceals information within designated Region Of Interests in an image.<n> Experimental results across three state-of-the-art VLMs demonstrate up to 98% reduction in detecting targeted ROIs.
Score: 15.158545794377169
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Recent years have witnessed remarkable progress in developing Vision-Language Models (VLMs) capable of processing both textual and visual inputs. These models have demonstrated impressive performance, leading to their widespread adoption in various applications. However, this widespread raises serious concerns regarding user privacy, particularly when models inadvertently process or expose private visual information. In this work, we frame the preservation of privacy in VLMs as an adversarial attack problem. We propose a novel attack strategy that selectively conceals information within designated Region Of Interests (ROIs) in an image, effectively preventing VLMs from accessing sensitive content while preserving the semantic integrity of the remaining image. Unlike conventional adversarial attacks that often disrupt the entire image, our method maintains high coherence in unmasked areas. Experimental results across three state-of-the-art VLMs namely LLaVA, Instruct-BLIP, and BLIP2-T5 demonstrate up to 98% reduction in detecting targeted ROIs, while maintaining global image semantics intact, as confirmed by high similarity scores between clean and adversarial outputs. We believe that this work contributes to a more privacy conscious use of multimodal models and offers a practical tool for further research, with the source code publicly available at: https://github.com/hbrachemi/Vlm_defense-attack.

Related papers

Privacy-Preserving in Connected and Autonomous Vehicles Through Vision to Text Transformation [0.9831489366502302]
This paper introduces a novel privacy-preserving framework that leverages feedback-based reinforcement learning (RL) and vision-language models (VLMs)<n>The main idea is to convert images into semantically equivalent textual descriptions, ensuring that scene-relevant information is retained while visual privacy is preserved.<n> Evaluation results demonstrate significant improvements in both privacy protection and textual quality.
arXiv Detail & Related papers (2025-06-18T20:02:24Z)
Screen Hijack: Visual Poisoning of VLM Agents in Mobile Environments [61.808686396077036]
We present GHOST, the first clean-label backdoor attack specifically designed for mobile agents built upon vision-language models (VLMs)<n>Our method manipulates only the visual inputs of a portion of the training samples without altering their corresponding labels or instructions.<n>We evaluate our method across six real-world Android apps and three VLM architectures adapted for mobile use.
arXiv Detail & Related papers (2025-06-16T08:09:32Z)
Transferable Adversarial Attacks on Black-Box Vision-Language Models [63.22532779621001]
adversarial attacks can transfer from open-source to proprietary black-box models in text-only and vision-only contexts.<n>We show that attackers can craft perturbations to induce specific attacker-chosen interpretations of visual information.<n>We discover that universal perturbations -- modifications applicable to a wide set of images -- can consistently induce these misinterpretations.
arXiv Detail & Related papers (2025-05-02T06:51:11Z)
AnyAttack: Towards Large-scale Self-supervised Adversarial Attacks on Vision-language Models [39.34959092321762]
Vision-Language Models (VLMs) are vulnerable to image-based adversarial attacks.<n>We present AnyAttack, a self-supervised framework that transcends the limitations of conventional attacks.
arXiv Detail & Related papers (2024-10-07T09:45:18Z)
TrojVLM: Backdoor Attack Against Vision Language Models [50.87239635292717]
This study introduces TrojVLM, the first exploration of backdoor attacks aimed at Vision Language Models (VLMs) TrojVLM inserts predetermined target text into output text when encountering poisoned images. A novel semantic preserving loss is proposed to ensure the semantic integrity of the original image content.
arXiv Detail & Related papers (2024-09-28T04:37:09Z)
The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models [31.166994121531232]
Vision-Language Models (VLMs) combine visual and textual understanding, rendering them well-suited for diverse tasks. These capabilities are built upon training on large amount of uncurated data crawled from the web. In this paper, we assess whether these vulnerabilities exist, focusing on identity leakage.
arXiv Detail & Related papers (2024-08-02T12:36:13Z)
White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models. Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input. An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z)
Safeguarding Vision-Language Models Against Patched Visual Prompt Injectors [31.383591942592467]
Vision-language models (VLMs) offer innovative ways to combine visual and textual data for enhanced understanding and interaction. Patch-based adversarial attack is considered the most realistic threat model in physical vision applications. We introduce SmoothVLM, a defense mechanism rooted in smoothing techniques, to protectVLMs from the threat of patched visual prompt injectors.
arXiv Detail & Related papers (2024-05-17T04:19:19Z)
Private Attribute Inference from Images with Vision-Language Models [2.9373912230684565]
Vision-language models (VLMs) are capable of understanding both images and text. We evaluate 7 state-of-the-art VLMs, finding that they can infer various personal attributes at up to 77.6% accuracy. We observe that accuracy scales with the general capabilities of the models, implying that future models can be misused as stronger inferential adversaries.
arXiv Detail & Related papers (2024-04-16T14:42:49Z)
Adversarial Prompt Tuning for Vision-Language Models [86.5543597406173]
Adversarial Prompt Tuning (AdvPT) is a technique to enhance the adversarial robustness of image encoders in Vision-Language Models (VLMs) We demonstrate that AdvPT improves resistance against white-box and black-box adversarial attacks and exhibits a synergistic effect when combined with existing image-processing-based defense techniques.
arXiv Detail & Related papers (2023-11-19T07:47:43Z)
Diff-Privacy: Diffusion-based Face Privacy Protection [58.1021066224765]
In this paper, we propose a novel face privacy protection method based on diffusion models, dubbed Diff-Privacy. Specifically, we train our proposed multi-scale image inversion module (MSI) to obtain a set of SDM format conditional embeddings of the original image. Based on the conditional embeddings, we design corresponding embedding scheduling strategies and construct different energy functions during the denoising process to achieve anonymization and visual identity information hiding.
arXiv Detail & Related papers (2023-09-11T09:26:07Z)

This list is automatically generated from the titles and abstracts of the papers in this site.