Robust Contrastive Language-Image Pre-training against Data Poisoning
and Backdoor Attacks
- URL: http://arxiv.org/abs/2303.06854v2
- Date: Tue, 19 Dec 2023 19:12:53 GMT
- Title: Robust Contrastive Language-Image Pre-training against Data Poisoning
and Backdoor Attacks
- Authors: Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman
- Abstract summary: We propose ROCLIP, the first effective method for robust pre-training multimodal vision-language models against targeted data poisoning and backdoor attacks.
ROCLIP effectively breaks the association between poisoned image-caption pairs by considering a relatively large and varying pool of random captions.
Our experiments show that ROCLIP renders state-of-the-art targeted data poisoning and backdoor attacks ineffective during pre-training CLIP models.
- Score: 52.26631767748843
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive vision-language representation learning has achieved
state-of-the-art performance for zero-shot classification, by learning from
millions of image-caption pairs crawled from the internet. However, the massive
data that powers large multimodal models such as CLIP, makes them extremely
vulnerable to various types of targeted data poisoning and backdoor attacks.
Despite this vulnerability, robust contrastive vision-language pre-training
against such attacks has remained unaddressed. In this work, we propose ROCLIP,
the first effective method for robust pre-training multimodal vision-language
models against targeted data poisoning and backdoor attacks. ROCLIP effectively
breaks the association between poisoned image-caption pairs by considering a
relatively large and varying pool of random captions, and matching every image
with the text that is most similar to it in the pool instead of its own
caption, every few epochs.It also leverages image and text augmentations to
further strengthen the defense and improve the performance of the model. Our
extensive experiments show that ROCLIP renders state-of-the-art targeted data
poisoning and backdoor attacks ineffective during pre-training CLIP models. In
particular, ROCLIP decreases the success rate for targeted data poisoning
attacks from 93.75% to 12.5% and that of backdoor attacks down to 0%, while
improving the model's linear probe performance by 10% and maintains a similar
zero shot performance compared to CLIP. By increasing the frequency of
matching, ROCLIP is able to defend strong attacks, which add up to 1% poisoned
examples to the data, and successfully maintain a low attack success rate of
12.5%, while trading off the performance on some tasks.
Related papers
- CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor Defense in Contrastive Learning [53.766434746801366]
We propose a fine-grained textbfText textbfAlignment textbfCleaner (TA-Cleaner) to cut off feature connections of backdoor triggers.
TA-Cleaner achieves state-of-the-art defensiveness among finetuning-based defense techniques.
arXiv Detail & Related papers (2024-09-26T07:35:23Z) - Adversarial Backdoor Defense in CLIP [47.6497532581449]
Multimodal contrastive pretraining, exemplified by models like CLIP, has been found to be vulnerable to backdoor attacks.
We propose Adversarial Backdoor Defense, a novel data augmentation strategy that aligns features with meticulously crafted adversarial examples.
Our experiments demonstrate that ABD provides robust defense against both traditional uni-modal and multimodal backdoor attacks targeting CLIP.
arXiv Detail & Related papers (2024-09-24T10:56:18Z) - Revisiting Backdoor Attacks against Large Vision-Language Models [76.42014292255944]
This paper empirically examines the generalizability of backdoor attacks during the instruction tuning of LVLMs.
We modify existing backdoor attacks based on the above key observations.
This paper underscores that even simple traditional backdoor strategies pose a serious threat to LVLMs.
arXiv Detail & Related papers (2024-06-27T02:31:03Z) - AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization [13.045125782574306]
This paper presents a novel adversarial attack strategy, AICAttack, designed to attack image captioning models through subtle perturbations on images.
operating within a black-box attack scenario, our algorithm requires no access to the target model's architecture, parameters, or gradient information.
We demonstrate AICAttack's effectiveness through extensive experiments on benchmark datasets against multiple victim models.
arXiv Detail & Related papers (2024-02-19T08:27:23Z) - BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP [55.33331463515103]
BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP.
It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts.
arXiv Detail & Related papers (2023-11-26T14:24:13Z) - Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks [46.504428925984406]
Contrastive Language-Image Pre-training (CLIP) on large image-caption datasets has achieved remarkable success in zero-shot classification.
CLIP is more vulnerable to targeted data poisoning and backdoor attacks, compared to supervised learning.
We propose a strong defense, SAFECLIP, to safely pre-train CLIP against targeted data poisoning and backdoor attacks.
arXiv Detail & Related papers (2023-10-05T19:42:03Z) - INK: Inheritable Natural Backdoor Attack Against Model Distillation [8.937026844871074]
We introduce INK, an inheritable natural backdoor attack that targets model distillation.
INK employs image variance as a backdoor trigger and enables both clean-image and clean-label attacks.
For instance, INK maintains an attack success rate of over 98% post-distillation, compared to an average success rate of 1.4% for existing methods.
arXiv Detail & Related papers (2023-04-21T14:35:47Z) - CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.