AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal
Contrastive Learning
- URL: http://arxiv.org/abs/2308.07026v1
- Date: Mon, 14 Aug 2023 09:29:22 GMT
- Title: AdvCLIP: Downstream-agnostic Adversarial Examples in Multimodal
Contrastive Learning
- Authors: Ziqi Zhou, Shengshan Hu, Minghui Li, Hangtao Zhang, Yechao Zhang, Hai
Jin
- Abstract summary: Multimodal contrastive learning aims to train a general-purpose feature extractor, such as CLIP, on vast amounts of raw, unlabeled paired image-text data.
Despite its promising prospect, the security issue of cross-modal pre-trained encoder has not been fully explored yet.
We propose AdvCLIP, the first attack framework for generating downstream-agnostic adversarial examples.
- Score: 24.266096176932813
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Multimodal contrastive learning aims to train a general-purpose feature
extractor, such as CLIP, on vast amounts of raw, unlabeled paired image-text
data. This can greatly benefit various complex downstream tasks, including
cross-modal image-text retrieval and image classification. Despite its
promising prospect, the security issue of cross-modal pre-trained encoder has
not been fully explored yet, especially when the pre-trained encoder is
publicly available for commercial use.
In this work, we propose AdvCLIP, the first attack framework for generating
downstream-agnostic adversarial examples based on cross-modal pre-trained
encoders. AdvCLIP aims to construct a universal adversarial patch for a set of
natural images that can fool all the downstream tasks inheriting the victim
cross-modal pre-trained encoder. To address the challenges of heterogeneity
between different modalities and unknown downstream tasks, we first build a
topological graph structure to capture the relevant positions between target
samples and their neighbors. Then, we design a topology-deviation based
generative adversarial network to generate a universal adversarial patch. By
adding the patch to images, we minimize their embeddings similarity to
different modality and perturb the sample distribution in the feature space,
achieving unviersal non-targeted attacks. Our results demonstrate the excellent
attack performance of AdvCLIP on two types of downstream tasks across eight
datasets. We also tailor three popular defenses to mitigate AdvCLIP,
highlighting the need for new defense mechanisms to defend cross-modal
pre-trained encoders.
Related papers
- BadCM: Invisible Backdoor Attack Against Cross-Modal Learning [110.37205323355695]
We introduce a novel bilateral backdoor to fill in the missing pieces of the puzzle in the cross-modal backdoor.
BadCM is the first invisible backdoor method deliberately designed for diverse cross-modal attacks within one unified framework.
arXiv Detail & Related papers (2024-10-03T03:51:53Z) - White-box Multimodal Jailbreaks Against Large Vision-Language Models [61.97578116584653]
We propose a more comprehensive strategy that jointly attacks both text and image modalities to exploit a broader spectrum of vulnerability within Large Vision-Language Models.
Our attack method begins by optimizing an adversarial image prefix from random noise to generate diverse harmful responses in the absence of text input.
An adversarial text suffix is integrated and co-optimized with the adversarial image prefix to maximize the probability of eliciting affirmative responses to various harmful instructions.
arXiv Detail & Related papers (2024-05-28T07:13:30Z) - VLATTACK: Multimodal Adversarial Attacks on Vision-Language Tasks via
Pre-trained Models [46.14455492739906]
Vision-Language (VL) pre-trained models have shown their superiority on many multimodal tasks.
Existing approaches mainly focus on exploring the adversarial robustness under the white-box setting.
We propose VLATTACK to generate adversarial samples by fusing perturbations of images and texts from both single-modal and multimodal levels.
arXiv Detail & Related papers (2023-10-07T02:18:52Z) - Downstream-agnostic Adversarial Examples [66.8606539786026]
AdvEncoder is first framework for generating downstream-agnostic universal adversarial examples based on pre-trained encoder.
Unlike traditional adversarial example works, the pre-trained encoder only outputs feature vectors rather than classification labels.
Our results show that an attacker can successfully attack downstream tasks without knowing either the pre-training dataset or the downstream dataset.
arXiv Detail & Related papers (2023-07-23T10:16:47Z) - CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z) - Language-Driven Anchors for Zero-Shot Adversarial Robustness [25.160195547250655]
We propose a Language-driven, Anchor-based Adversarial Training strategy.
By leveraging the semantic consistency of the text encoders, LAAT aims to enhance the adversarial robustness of the image model.
We show that LAAT significantly improves zero-shot adversarial robustness over state-of-the-art methods.
arXiv Detail & Related papers (2023-01-30T17:34:43Z) - Beyond the Prototype: Divide-and-conquer Proxies for Few-shot
Segmentation [63.910211095033596]
Few-shot segmentation aims to segment unseen-class objects given only a handful of densely labeled samples.
We propose a simple yet versatile framework in the spirit of divide-and-conquer.
Our proposed approach, named divide-and-conquer proxies (DCP), allows for the development of appropriate and reliable information.
arXiv Detail & Related papers (2022-04-21T06:21:14Z) - Defending Adversarial Examples via DNN Bottleneck Reinforcement [20.08619981108837]
This paper presents a reinforcement scheme to alleviate the vulnerability of Deep Neural Networks (DNN) against adversarial attacks.
By reinforcing the former while maintaining the latter, any redundant information, be it adversarial or not, should be removed from the latent representation.
In order to reinforce the information bottleneck, we introduce the multi-scale low-pass objective and multi-scale high-frequency communication for better frequency steering in the network.
arXiv Detail & Related papers (2020-08-12T11:02:01Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.