Related papers: CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning

CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning

URL: http://arxiv.org/abs/2303.03323v3
Date: Mon, 17 Jul 2023 06:03:16 GMT
Title: CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning
Authors: Hritik Bansal, Nishad Singhi, Yu Yang, Fan Yin, Aditya Grover, Kai-Wei Chang
Abstract summary: CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks. CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
Score: 63.72975421109622
License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
Abstract: Multimodal contrastive pretraining has been used to train multimodal representation models, such as CLIP, on large amounts of paired image-text data. However, previous studies have revealed that such models are vulnerable to backdoor attacks. Specifically, when trained on backdoored examples, CLIP learns spurious correlations between the embedded backdoor trigger and the target label, aligning their representations in the joint embedding space. Injecting even a small number of poisoned examples, such as 75 examples in 3 million pretraining data, can significantly manipulate the model's behavior, making it difficult to detect or unlearn such correlations. To address this issue, we propose CleanCLIP, a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks by independently re-aligning the representations for individual modalities. We demonstrate that unsupervised finetuning using a combination of multimodal contrastive and unimodal self-supervised objectives for individual modalities can significantly reduce the impact of the backdoor attack. Additionally, we show that supervised finetuning on task-specific labeled image data removes the backdoor trigger from the CLIP vision encoder. We show empirically that CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning. The code and checkpoints are available at https://github.com/nishadsinghi/CleanCLIP.

Related papers

Detecting Backdoor Samples in Contrastive Language Image Pretraining [32.85582585781569]
Contrastive language-image pretraining (CLIP) has been found to be vulnerable to poisoning backdoor attacks. This raises security concerns on the current practice of pretraining large-scale models on unscrutinized web data using CLIP.
arXiv Detail & Related papers (2025-02-03T14:21:05Z)
Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning [19.638259197558625]
Multimodal contrastive learning models (e.g., CLIP) can learn high-quality representations from large-scale image-text datasets. They exhibit significant vulnerabilities to backdoor attacks, raising serious safety concerns. We propose Repulsive Visual Prompt Tuning (RVPT) as a novel defense approach.
arXiv Detail & Related papers (2024-12-29T08:09:20Z)
BDetCLIP: Multimodal Prompting Contrastive Test-Time Backdoor Detection [10.99542790672233]
We propose BDetCLIP, a novel test-time backdoor detection method based on contrastive prompting. We empirically find that the visual representations of backdoored images are insensitive to both benign and malignant changes in class description texts. Our proposed BDetCLIP is superior to state-of-the-art backdoor detection methods, in terms of both effectiveness and efficiency.
arXiv Detail & Related papers (2024-05-24T06:52:54Z)
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z)
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP [55.33331463515103]
BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP. It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts.
arXiv Detail & Related papers (2023-11-26T14:24:13Z)
Effective Backdoor Mitigation in Vision-Language Models Depends on the Pre-training Objective [71.39995120597999]
Modern machine learning models are vulnerable to adversarial and backdoor attacks. Such risks are heightened by the prevalent practice of collecting massive, internet-sourced datasets for training multimodal models. CleanCLIP is the current state-of-the-art approach to mitigate the effects of backdooring in multimodal models.
arXiv Detail & Related papers (2023-11-25T06:55:13Z)
Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples [67.66153875643964]
Backdoor attacks are serious security threats to machine learning models. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk.
arXiv Detail & Related papers (2023-07-20T03:56:04Z)
Backdoor Learning on Sequence to Sequence Models [94.23904400441957]
In this paper, we study whether sequence-to-sequence (seq2seq) models are vulnerable to backdoor attacks. Specifically, we find by only injecting 0.2% samples of the dataset, we can cause the seq2seq model to generate the designated keyword and even the whole sentence. Extensive experiments on machine translation and text summarization have been conducted to show our proposed methods could achieve over 90% attack success rate on multiple datasets and models.
arXiv Detail & Related papers (2023-05-03T20:31:13Z)
Training set cleansing of backdoor poisoning by self-supervised representation learning [0.0]
A backdoor or Trojan attack is an important type of data poisoning attack against deep neural network (DNN) We show that supervised training may build stronger association between the backdoor pattern and the associated target class than that between normal features and the true class of origin. We propose to use unsupervised representation learning to avoid emphasising backdoor-poisoned training samples and learn a similar feature embedding for samples of the same class.
arXiv Detail & Related papers (2022-10-19T03:29:58Z)
DeepSight: Mitigating Backdoor Attacks in Federated Learning Through Deep Model Inspection [26.593268413299228]
Federated Learning (FL) allows multiple clients to collaboratively train a Neural Network (NN) model on their private data without revealing the data. DeepSight is a novel model filtering approach for mitigating backdoor attacks. We show that it can mitigate state-of-the-art backdoor attacks with a negligible impact on the model's performance on benign data.
arXiv Detail & Related papers (2022-01-03T17:10:07Z)
Backdoor Attacks on Federated Learning with Lottery Ticket Hypothesis [49.38856542573576]
Edge devices in federated learning usually have much more limited computation and communication resources compared to servers in a data center. In this work, we empirically demonstrate that Lottery Ticket models are equally vulnerable to backdoor attacks as the original dense models.
arXiv Detail & Related papers (2021-09-22T04:19:59Z)
Backdoor Attacks on Federated Meta-Learning [0.225596179391365]
We analyze the effects of backdoor attacks on federated meta-learning. We propose a defense mechanism inspired by matching networks, where the class of an input is predicted from the similarity of its features.
arXiv Detail & Related papers (2020-06-12T09:23:24Z)

This list is automatically generated from the titles and abstracts of the papers in this site.