Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning
- URL: http://arxiv.org/abs/2412.20392v1
- Date: Sun, 29 Dec 2024 08:09:20 GMT
- Title: Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning
- Authors: Zhifang Zhang, Shuo He, Bingquan Shen, Lei Feng,
- Abstract summary: Multimodal contrastive learning models (e.g., CLIP) can learn high-quality representations from large-scale image-text datasets.
They exhibit significant vulnerabilities to backdoor attacks, raising serious safety concerns.
We propose Repulsive Visual Prompt Tuning (RVPT) as a novel defense approach.
- Score: 13.802845998402677
- License:
- Abstract: Multimodal contrastive learning models (e.g., CLIP) can learn high-quality representations from large-scale image-text datasets, yet they exhibit significant vulnerabilities to backdoor attacks, raising serious safety concerns. In this paper, we disclose that CLIP's vulnerabilities primarily stem from its excessive encoding of class-irrelevant features, which can compromise the model's visual feature resistivity to input perturbations, making it more susceptible to capturing the trigger patterns inserted by backdoor attacks. Inspired by this finding, we propose Repulsive Visual Prompt Tuning (RVPT), a novel defense approach that employs specially designed deep visual prompt tuning and feature-repelling loss to eliminate excessive class-irrelevant features while simultaneously optimizing cross-entropy loss to maintain clean accuracy. Unlike existing multimodal backdoor defense methods that typically require the availability of poisoned data or involve fine-tuning the entire model, RVPT leverages few-shot downstream clean samples and only tunes a small number of parameters. Empirical results demonstrate that RVPT tunes only 0.27\% of the parameters relative to CLIP, yet it significantly outperforms state-of-the-art baselines, reducing the attack success rate from 67.53\% to 2.76\% against SoTA attacks and effectively generalizing its defensive capabilities across multiple datasets.
Related papers
- TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models [53.91006249339802]
We propose a novel defense method called Test-Time Adversarial Prompt Tuning (TAPT) to enhance the inference robustness of CLIP against visual adversarial attacks.
TAPT is a test-time defense method that learns defensive bimodal (textual and visual) prompts to robustify the inference process of CLIP.
We evaluate the effectiveness of TAPT on 11 benchmark datasets, including ImageNet and 10 other zero-shot datasets.
arXiv Detail & Related papers (2024-11-20T08:58:59Z) - Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning.
This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities.
In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z) - Fisher Information guided Purification against Backdoor Attacks [22.412186735687786]
We propose a novel backdoor purification framework, Fisher Information guided Purification (FIP)
FIP consists of a couple of novel regularizers that aid the model in suppressing the backdoor effects and retaining the acquired knowledge of clean data distribution.
In addition, we introduce an efficient variant of FIP, dubbed as Fast FIP, which reduces the number of tunable parameters significantly and obtains an impressive runtime gain of almost $5times$.
arXiv Detail & Related papers (2024-09-01T23:09:44Z) - VL-Trojan: Multimodal Instruction Backdoor Attacks against
Autoregressive Visual Language Models [65.23688155159398]
Autoregressive Visual Language Models (VLMs) showcase impressive few-shot learning capabilities in a multimodal context.
Recently, multimodal instruction tuning has been proposed to further enhance instruction-following abilities.
Adversaries can implant a backdoor by injecting poisoned samples with triggers embedded in instructions or images.
We propose a multimodal instruction backdoor attack, namely VL-Trojan.
arXiv Detail & Related papers (2024-02-21T14:54:30Z) - Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective [71.39995120597999]
Modern machine learning models are vulnerable to adversarial and backdoor attacks.
Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for training multimodal models.
CleanCLIP is the current state-of-the-art approach to mitigate the effects of backdooring in multimodal models.
arXiv Detail & Related papers (2023-11-25T06:55:13Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z) - Invariant Aggregator for Defending against Federated Backdoor Attacks [28.416262423174796]
Federated learning enables training high-utility models across several clients without directly sharing their private data.
As a downside, the federated setting makes the model vulnerable to various adversarial attacks in the presence of malicious clients.
We propose an invariant aggregator that redirects the aggregated update to invariant directions that are generally useful.
arXiv Detail & Related papers (2022-10-04T18:06:29Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.