CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning
- URL: http://arxiv.org/abs/2303.03323v3
- Date: Mon, 17 Jul 2023 06:03:16 GMT
- Title: CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning
- Authors: Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, Kai-Wei
Chang
- Abstract summary: CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
- Score: 63.72975421109622
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal contrastive pretraining has been used to train multimodal
representation models, such as CLIP, on large amounts of paired image-text
data. However, previous studies have revealed that such models are vulnerable
to backdoor attacks. Specifically, when trained on backdoored examples, CLIP
learns spurious correlations between the embedded backdoor trigger and the
target label, aligning their representations in the joint embedding space.
Injecting even a small number of poisoned examples, such as 75 examples in 3
million pretraining data, can significantly manipulate the model's behavior,
making it difficult to detect or unlearn such correlations. To address this
issue, we propose CleanCLIP, a finetuning framework that weakens the learned
spurious associations introduced by backdoor attacks by independently
re-aligning the representations for individual modalities. We demonstrate that
unsupervised finetuning using a combination of multimodal contrastive and
unimodal self-supervised objectives for individual modalities can significantly
reduce the impact of the backdoor attack. Additionally, we show that supervised
finetuning on task-specific labeled image data removes the backdoor trigger
from the CLIP vision encoder. We show empirically that CleanCLIP maintains
model performance on benign examples while erasing a range of backdoor attacks
on multimodal contrastive learning. The code and checkpoints are available at
https://github.com/nishadsinghi/CleanCLIP.
Related papers
- BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection [10.99542790672233]
We propose BDetCLIP, a novel test-time backdoor detection method based on contrastive prompting.
We empirically find that the visual representations of backdoored images are insensitive to both benign and malignant changes in class description texts.
Our proposed BDetCLIP is superior to state-of-the-art backdoor detection methods, in terms of both effectiveness and efficiency.
arXiv Detail & Related papers (2024-05-24T06:52:54Z) - Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors.
We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures.
This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z) - BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP [55.33331463515103]
BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP.
It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts.
arXiv Detail & Related papers (2023-11-26T14:24:13Z) - Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared
Adversarial Examples [67.66153875643964]
Backdoor attacks are serious security threats to machine learning models.
In this paper, we explore the task of purifying a backdoored model using a small clean dataset.
By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk.
arXiv Detail & Related papers (2023-07-20T03:56:04Z) - Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks.
Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence.
Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z) - Training set cleansing of backdoor poisoning by self-supervised
representation learning [0.0]
A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN)
We show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin.
We propose to use unsupervised representation learning to avoid emphasising backdoor-poisoned training samples and learn a similar feature embedding for samples of the same class.
arXiv Detail & Related papers (2022-10-19T03:29:58Z) - DeepSight: Mitigating Backdoor Attacks in Federated Learning Through
Deep Model Inspection [26.593268413299228]
Federated Learning (FL) allows multiple clients to collaboratively train a Neural Network (NN) model on their private data without revealing the data.
DeepSight is a novel model filtering approach for mitigating backdoor attacks.
We show that it can mitigate state-of-the-art backdoor attacks with a negligible impact on the model's performance on benign data.
arXiv Detail & Related papers (2022-01-03T17:10:07Z) - Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis [49.38856542573576]
Edge devices in federated learning usually have much more limited computation and communication resources compared to servers in a data center.
In this work, we empirically demonstrate that Lottery Ticket models are equally vulnerable to backdoor attacks as the original dense models.
arXiv Detail & Related papers (2021-09-22T04:19:59Z) - Backdoor Attacks on Federated Meta-Learning [0.225596179391365]
We analyze the effects of backdoor attacks on federated meta-learning.
We propose a defense mechanism inspired by matching networks, where the class of an input is predicted from the similarity of its features.
arXiv Detail & Related papers (2020-06-12T09:23:24Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.