CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning
- URL: http://arxiv.org/abs/2303.03323v3
- Date: Mon, 17 Jul 2023 06:03:16 GMT
- Title: CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning
- Authors: Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, Kai-Wei
Chang
- Abstract summary: CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
- Score: 63.72975421109622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal contrastive pretraining has been used to train multimodal
representation models, such as CLIP, on large amounts of paired image-text
data. However, previous studies have revealed that such models are vulnerable
to backdoor attacks. Specifically, when trained on backdoored examples, CLIP
learns spurious correlations between the embedded backdoor trigger and the
target label, aligning their representations in the joint embedding space.
Injecting even a small number of poisoned examples, such as 75 examples in 3
million pretraining data, can significantly manipulate the model's behavior,
making it difficult to detect or unlearn such correlations. To address this
issue, we propose CleanCLIP, a finetuning framework that weakens the learned
spurious associations introduced by backdoor attacks by independently
re-aligning the representations for individual modalities. We demonstrate that
unsupervised finetuning using a combination of multimodal contrastive and
unimodal self-supervised objectives for individual modalities can significantly
reduce the impact of the backdoor attack. Additionally, we show that supervised
finetuning on task-specific labeled image data removes the backdoor trigger
from the CLIP vision encoder. We show empirically that CleanCLIP maintains
model performance on benign examples while erasing a range of backdoor attacks
on multimodal contrastive learning. The code and checkpoints are available at
https://github.com/nishadsinghi/CleanCLIP.
Related papers
- Detecting Backdoor Samples in Contrastive Language Image Pretraining [32.85582585781569]
Contrastive language-image pretraining (CLIP) has been found to be vulnerable to poisoning backdoor attacks.
This raises security concerns on the current practice of pretraining large-scale models on unscrutinized web data using CLIP.
arXiv Detail & Related papers (2025-02-03T14:21:05Z) - Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning [13.802845998402677]
Multimodal contrastive learning models (e.g., CLIP) can learn high-quality representations from large-scale image-text datasets.
They exhibit significant vulnerabilities to backdoor attacks, raising serious safety concerns.
We propose Repulsive Visual Prompt Tuning (RVPT) as a novel defense approach.
arXiv Detail & Related papers (2024-12-29T08:09:20Z) - BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection [10.99542790672233]
We propose BDetCLIP, a novel test-time backdoor detection method based on contrastive prompting.
We empirically find that the visual representations of backdoored images are insensitive to both benign and malignant changes in class description texts.
Our proposed BDetCLIP is superior to state-of-the-art backdoor detection methods, in terms of both effectiveness and efficiency.
arXiv Detail & Related papers (2024-05-24T06:52:54Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP [55.33331463515103]
BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP.
It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts.
arXiv Detail & Related papers (2023-11-26T14:24:13Z) - Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective [71.39995120597999]
Modern machine learning models are vulnerable to adversarial and backdoor attacks.
Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for training multimodal models.
CleanCLIP is the current state-of-the-art approach to mitigate the effects of backdooring in multimodal models.
arXiv Detail & Related papers (2023-11-25T06:55:13Z) - Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared
Adversarial Examples [67.66153875643964]
Backdoor attacks are serious security threats to machine learning models.
In this paper, we explore the task of purifying a backdoored model using a small clean dataset.
By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk.
arXiv Detail & Related papers (2023-07-20T03:56:04Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - Training set cleansing of backdoor poisoning by self-supervised
representation learning [0.0]
A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN)
We show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin.
We propose to use unsupervised representation learning to avoid emphasising backdoor-poisoned training samples and learn a similar feature embedding for samples of the same class.
arXiv Detail & Related papers (2022-10-19T03:29:58Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.