Robust Contrastive Language-Image Pre-training against Data Poisoning
and Backdoor Attacks
- URL: http://arxiv.org/abs/2303.06854v2
- Date: Tue, 19 Dec 2023 19:12:53 GMT
- Title: Robust Contrastive Language-Image Pre-training against Data Poisoning
and Backdoor Attacks
- Authors: Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman
- Abstract summary: We propose ROCLIP, the first effective method for robust pre-training multimodal vision-language models against targeted data poisoning and backdoor attacks.
ROCLIP effectively breaks the association between poisoned image-caption pairs by considering a relatively large and varying pool of random captions.
Our experiments show that ROCLIP renders state-of-the-art targeted data poisoning and backdoor attacks ineffective during pre-training CLIP models.
- Score: 52.26631767748843
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Contrastive vision-language representation learning has achieved
state-of-the-art performance for zero-shot classification, by learning from
millions of image-caption pairs crawled from the internet. However, the massive
data that powers large multimodal models such as CLIP, makes them extremely
vulnerable to various types of targeted data poisoning and backdoor attacks.
Despite this vulnerability, robust contrastive vision-language pre-training
against such attacks has remained unaddressed. In this work, we propose ROCLIP,
the first effective method for robust pre-training multimodal vision-language
models against targeted data poisoning and backdoor attacks. ROCLIP effectively
breaks the association between poisoned image-caption pairs by considering a
relatively large and varying pool of random captions, and matching every image
with the text that is most similar to it in the pool instead of its own
caption, every few epochs.It also leverages image and text augmentations to
further strengthen the defense and improve the performance of the model. Our
extensive experiments show that ROCLIP renders state-of-the-art targeted data
poisoning and backdoor attacks ineffective during pre-training CLIP models. In
particular, ROCLIP decreases the success rate for targeted data poisoning
attacks from 93.75% to 12.5% and that of backdoor attacks down to 0%, while
improving the model's linear probe performance by 10% and maintains a similar
zero shot performance compared to CLIP. By increasing the frequency of
matching, ROCLIP is able to defend strong attacks, which add up to 1% poisoned
examples to the data, and successfully maintain a low attack success rate of
12.5%, while trading off the performance on some tasks.
Related papers
- Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning [13.802845998402677]
Multimodal contrastive learning models (e.g., CLIP) can learn high-quality representations from large-scale image-text datasets.
They exhibit significant vulnerabilities to backdoor attacks, raising serious safety concerns.
We propose Repulsive Visual Prompt Tuning (RVPT) as a novel defense approach.
arXiv Detail & Related papers (2024-12-29T08:09:20Z) - TAPT: Test-Time Adversarial Prompt Tuning for Robust Inference in Vision-Language Models [53.91006249339802]
We propose a novel defense method called Test-Time Adversarial Prompt Tuning (TAPT) to enhance the inference robustness of CLIP against visual adversarial attacks.
TAPT is a test-time defense method that learns defensive bimodal (textual and visual) prompts to robustify the inference process of CLIP.
We evaluate the effectiveness of TAPT on 11 benchmark datasets, including ImageNet and 10 other zero-shot datasets.
arXiv Detail & Related papers (2024-11-20T08:58:59Z) - CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor Defense in Contrastive Learning [53.766434746801366]
We propose a fine-grained textbfText textbfAlignment textbfCleaner (TA-Cleaner) to cut off feature connections of backdoor triggers.
TA-Cleaner achieves state-of-the-art defensiveness among finetuning-based defense techniques.
arXiv Detail & Related papers (2024-09-26T07:35:23Z) - Adversarial Backdoor Defense in CLIP [47.6497532581449]
Multimodal contrastive pretraining, exemplified by models like CLIP, has been found to be vulnerable to backdoor attacks.
We propose Adversarial Backdoor Defense, a novel data augmentation strategy that aligns features with meticulously crafted adversarial examples.
Our experiments demonstrate that ABD provides robust defense against both traditional uni-modal and multimodal backdoor attacks targeting CLIP.
arXiv Detail & Related papers (2024-09-24T10:56:18Z) - AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization [13.045125782574306]
This paper presents a novel adversarial attack strategy, AICAttack, designed to attack image captioning models through subtle perturbations on images.
operating within a black-box attack scenario, our algorithm requires no access to the target model's architecture, parameters, or gradient information.
We demonstrate AICAttack's effectiveness through extensive experiments on benchmark datasets against multiple victim models.
arXiv Detail & Related papers (2024-02-19T08:27:23Z) - BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP [55.33331463515103]
BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP.
It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts.
arXiv Detail & Related papers (2023-11-26T14:24:13Z) - Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks [46.504428925984406]
Contrastive Language-Image Pre-training (CLIP) on large image-caption datasets has achieved remarkable success in zero-shot classification.
CLIP is more vulnerable to targeted data poisoning and backdoor attacks, compared to supervised learning.
We propose a strong defense, SAFECLIP, to safely pre-train CLIP against targeted data poisoning and backdoor attacks.
arXiv Detail & Related papers (2023-10-05T19:42:03Z) - CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.