Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models
- URL: http://arxiv.org/abs/2402.06659v2
- Date: Mon, 14 Oct 2024 16:17:34 GMT
- Title: Shadowcast: Stealthy Data Poisoning Attacks Against Vision-Language Models
- Authors: Yuancheng Xu, Jiarui Yao, Manli Shu, Yanchao Sun, Zichu Wu, Ning Yu, Tom Goldstein, Furong Huang,
- Abstract summary: This study takes the first step in exposing Vision-Language Models' susceptibility to data poisoning attacks.
We introduce Shadowcast, a stealthy data poisoning attack where poison samples are visually indistinguishable from benign images.
We show that Shadowcast effectively achieves the attacker's intentions using as few as 50 poison samples.
- Score: 73.37389786808174
- License:
- Abstract: Vision-Language Models (VLMs) excel in generating textual responses from visual inputs, but their versatility raises security concerns. This study takes the first step in exposing VLMs' susceptibility to data poisoning attacks that can manipulate responses to innocuous, everyday prompts. We introduce Shadowcast, a stealthy data poisoning attack where poison samples are visually indistinguishable from benign images with matching texts. Shadowcast demonstrates effectiveness in two attack types. The first is a traditional Label Attack, tricking VLMs into misidentifying class labels, such as confusing Donald Trump for Joe Biden. The second is a novel Persuasion Attack, leveraging VLMs' text generation capabilities to craft persuasive and seemingly rational narratives for misinformation, such as portraying junk food as healthy. We show that Shadowcast effectively achieves the attacker's intentions using as few as 50 poison samples. Crucially, the poisoned samples demonstrate transferability across different VLM architectures, posing a significant concern in black-box settings. Moreover, Shadowcast remains potent under realistic conditions involving various text prompts, training data augmentation, and image compression techniques. This work reveals how poisoned VLMs can disseminate convincing yet deceptive misinformation to everyday, benign users, emphasizing the importance of data integrity for responsible VLM deployments. Our code is available at: https://github.com/umd-huang-lab/VLM-Poisoning.
Related papers
- Adversarial Attacks on Multimodal Agents [73.97379283655127]
Vision-enabled language models (VLMs) are now used to build autonomous multimodal agents capable of taking actions in real environments.
We show that multimodal agents raise new safety risks, even though attacking agents is more challenging than prior attacks due to limited access to and knowledge about the environment.
arXiv Detail & Related papers (2024-06-18T17:32:48Z) - The Victim and The Beneficiary: Exploiting a Poisoned Model to Train a Clean Model on Poisoned Data [4.9676716806872125]
backdoor attacks have posed a serious security threat to the training process of deep neural networks (DNNs)
We propose a novel dual-network training framework: The Victim and The Beneficiary (V&B), which exploits a poisoned model to train a clean model without extra benign samples.
Our framework is effective in preventing backdoor injection and robust to various attacks while maintaining the performance on benign samples.
arXiv Detail & Related papers (2024-04-17T11:15:58Z) - ImgTrojan: Jailbreaking Vision-Language Models with ONE Image [40.55590043993117]
We propose a novel jailbreaking attack against vision language models (VLMs)
A scenario where our poisoned (image, text) data pairs are included in the training data is assumed.
By replacing the original textual captions with malicious jailbreak prompts, our method can perform jailbreak attacks with the poisoned images.
arXiv Detail & Related papers (2024-03-05T12:21:57Z) - Vision-LLMs Can Fool Themselves with Self-Generated Typographic Attacks [62.34019142949628]
Typographic Attacks, which involve pasting misleading text onto an image, were noted to harm the performance of Vision-Language Models like CLIP.
We introduce two novel and more effective textitSelf-Generated attacks which prompt the LVLM to generate an attack against itself.
Using our benchmark, we uncover that Self-Generated attacks pose a significant threat, reducing LVLM(s) classification performance by up to 33%.
arXiv Detail & Related papers (2024-02-01T14:41:20Z) - From Trojan Horses to Castle Walls: Unveiling Bilateral Data Poisoning Effects in Diffusion Models [19.140908259968302]
We investigate whether BadNets-like data poisoning methods can directly degrade the generation by DMs.
We show that a BadNets-like data poisoning attack remains effective in DMs for producing incorrect images.
Poisoned DMs exhibit an increased ratio of triggers, a phenomenon we refer to as trigger amplification'
arXiv Detail & Related papers (2023-11-04T11:00:31Z) - Nightshade: Prompt-Specific Poisoning Attacks on Text-to-Image Generative Models [26.301156075883483]
We show that poisoning attacks can be successful on generative models.
We introduce Nightshade, an optimized prompt-specific poisoning attack.
We show that Nightshade attacks can destabilize general features in a text-to-image generative model.
arXiv Detail & Related papers (2023-10-20T21:54:10Z) - Shadow Alignment: The Ease of Subverting Safely-Aligned Language Models [102.63973600144308]
Open-source large language models can be easily subverted to generate harmful content.
Experiments across 8 models released by 5 different organizations demonstrate the effectiveness of shadow alignment attack.
This study serves as a clarion call for a collective effort to overhaul and fortify the safety of open-source LLMs against malicious attackers.
arXiv Detail & Related papers (2023-10-04T16:39:31Z) - Adversarial Examples Make Strong Poisons [55.63469396785909]
We show that adversarial examples, originally intended for attacking pre-trained models, are even more effective for data poisoning than recent methods designed specifically for poisoning.
Our method, adversarial poisoning, is substantially more effective than existing poisoning methods for secure dataset release.
arXiv Detail & Related papers (2021-06-21T01:57:14Z) - Defening against Adversarial Denial-of-Service Attacks [0.0]
Data poisoning is one of the most relevant security threats against machine learning and data-driven technologies.
We propose a new approach of detecting DoS poisoned instances.
We evaluate our defence against two DoS poisoning attacks and seven datasets, and find that it reliably identifies poisoned instances.
arXiv Detail & Related papers (2021-04-14T09:52:36Z) - Witches' Brew: Industrial Scale Data Poisoning via Gradient Matching [56.280018325419896]
Data Poisoning attacks modify training data to maliciously control a model trained on such data.
We analyze a particularly malicious poisoning attack that is both "from scratch" and "clean label"
We show that it is the first poisoning method to cause targeted misclassification in modern deep networks trained from scratch on a full-sized, poisoned ImageNet dataset.
arXiv Detail & Related papers (2020-09-04T16:17:54Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.