Related papers: Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks

Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks

URL: http://arxiv.org/abs/2303.06854v2
Date: Tue, 19 Dec 2023 19:12:53 GMT
Title: Robust Contrastive Language-Image Pre-training against Data Poisoning and Backdoor Attacks
Authors: Wenhan Yang, Jingdong Gao, Baharan Mirzasoleiman
Abstract summary: We propose ROCLIP, the first effective method for robust pre-training multimodal vision-language models against targeted data poisoning and backdoor attacks. ROCLIP effectively breaks the association between poisoned image-caption pairs by considering a relatively large and varying pool of random captions. Our experiments show that ROCLIP renders state-of-the-art targeted data poisoning and backdoor attacks ineffective during pre-training CLIP models.
Score: 52.26631767748843
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Contrastive vision-language representation learning has achieved state-of-the-art performance for zero-shot classification, by learning from millions of image-caption pairs crawled from the internet. However, the massive data that powers large multimodal models such as CLIP, makes them extremely vulnerable to various types of targeted data poisoning and backdoor attacks. Despite this vulnerability, robust contrastive vision-language pre-training against such attacks has remained unaddressed. In this work, we propose ROCLIP, the first effective method for robust pre-training multimodal vision-language models against targeted data poisoning and backdoor attacks. ROCLIP effectively breaks the association between poisoned image-caption pairs by considering a relatively large and varying pool of random captions, and matching every image with the text that is most similar to it in the pool instead of its own caption, every few epochs.It also leverages image and text augmentations to further strengthen the defense and improve the performance of the model. Our extensive experiments show that ROCLIP renders state-of-the-art targeted data poisoning and backdoor attacks ineffective during pre-training CLIP models. In particular, ROCLIP decreases the success rate for targeted data poisoning attacks from 93.75% to 12.5% and that of backdoor attacks down to 0%, while improving the model's linear probe performance by 10% and maintains a similar zero shot performance compared to CLIP. By increasing the frequency of matching, ROCLIP is able to defend strong attacks, which add up to 1% poisoned examples to the data, and successfully maintain a low attack success rate of 12.5%, while trading off the performance on some tasks.

Related papers

Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in Pre-trained Vision-Language Models [42.81731204702258]
Class-wise Backdoor Prompt Tuning (CBPT) is an efficient and effective method that operates on the text prompts to indirectly purify poisoned Vision-Language Models (VLMs) CBPT significantly mitigates backdoor threats while preserving model utility, e.g. an average Clean Accuracy (CA) of 58.86% and an Attack Success Rate (ASR) of 0.39% across seven mainstream backdoor attacks.
arXiv Detail & Related papers (2025-02-26T16:25:15Z)
ELBA-Bench: An Efficient Learning Backdoor Attacks Benchmark for Large Language Models [55.93380086403591]
Generative large language models are vulnerable to backdoor attacks. $textitELBA-Bench$ allows attackers to inject backdoor through parameter efficient fine-tuning. $textitELBA-Bench$ provides over 1300 experiments.
arXiv Detail & Related papers (2025-02-22T12:55:28Z)
Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning [19.638259197558625]
Multimodal contrastive learning models (e.g., CLIP) can learn high-quality representations from large-scale image-text datasets. They exhibit significant vulnerabilities to backdoor attacks, raising serious safety concerns. We propose Repulsive Visual Prompt Tuning (RVPT) as a novel defense approach.
arXiv Detail & Related papers (2024-12-29T08:09:20Z)
CleanerCLIP: Fine-grained Counterfactual Semantic Augmentation for Backdoor Defense in Contrastive Learning [53.766434746801366]
We propose a fine-grained textbfText textbfAlignment textbfCleaner (TA-Cleaner) to cut off feature connections of backdoor triggers. TA-Cleaner achieves state-of-the-art defensiveness among finetuning-based defense techniques.
arXiv Detail & Related papers (2024-09-26T07:35:23Z)
Adversarial Backdoor Defense in CLIP [47.6497532581449]
Multimodal contrastive pretraining, exemplified by models like CLIP, has been found to be vulnerable to backdoor attacks. We propose Adversarial Backdoor Defense, a novel data augmentation strategy that aligns features with meticulously crafted adversarial examples. Our experiments demonstrate that ABD provides robust defense against both traditional uni-modal and multimodal backdoor attacks targeting CLIP.
arXiv Detail & Related papers (2024-09-24T10:56:18Z)
Revisiting Backdoor Attacks against Large Vision-Language Models [76.42014292255944]
This paper empirically examines the generalizability of backdoor attacks during the instruction tuning of LVLMs. We modify existing backdoor attacks based on the above key observations. This paper underscores that even simple traditional backdoor strategies pose a serious threat to LVLMs.
arXiv Detail & Related papers (2024-06-27T02:31:03Z)
AICAttack: Adversarial Image Captioning Attack with Attention-Based Optimization [13.045125782574306]
This paper presents a novel adversarial attack strategy, AICAttack, designed to attack image captioning models through subtle perturbations on images. operating within a black-box attack scenario, our algorithm requires no access to the target model's architecture, parameters, or gradient information. We demonstrate AICAttack's effectiveness through extensive experiments on benchmark datasets against multiple victim models.
arXiv Detail & Related papers (2024-02-19T08:27:23Z)
BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP [55.33331463515103]
BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP. It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts.
arXiv Detail & Related papers (2023-11-26T14:24:13Z)
Better Safe than Sorry: Pre-training CLIP against Targeted Data Poisoning and Backdoor Attacks [46.504428925984406]
Contrastive Language-Image Pre-training (CLIP) on large image-caption datasets has achieved remarkable success in zero-shot classification. CLIP is more vulnerable to targeted data poisoning and backdoor attacks, compared to supervised learning. We propose a strong defense, SAFECLIP, to safely pre-train CLIP against targeted data poisoning and backdoor attacks.
arXiv Detail & Related papers (2023-10-05T19:42:03Z)
INK: Inheritable Natural Backdoor Attack Against Model Distillation [8.937026844871074]
We introduce INK, an inheritable natural backdoor attack that targets model distillation. INK employs image variance as a backdoor trigger and enables both clean-image and clean-label attacks. For instance, INK maintains an attack success rate of over 98% post-distillation, compared to an average success rate of 1.4% for existing methods.
arXiv Detail & Related papers (2023-04-21T14:35:47Z)
CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks. CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z)

This list is automatically generated from the titles and abstracts of the papers in this site.

This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.