Robust Defense Strategies for Multimodal Contrastive Learning: Efficient Fine-tuning Against Backdoor Attacks
- URL: http://arxiv.org/abs/2511.13545v1
- Date: Mon, 17 Nov 2025 16:16:50 GMT
- Title: Robust Defense Strategies for Multimodal Contrastive Learning: Efficient Fine-tuning Against Backdoor Attacks
- Authors: Md. Iqbal Hossain, Afia Sajeeda, Neeresh Kumar Perla, Ming Shao,
- Abstract summary: multimodal deep learning models, such as CLIP, are not safe for adversarial attacks.<n>In this study, we introduce an innovative strategy to enhance the robustness of multimodal contrastive learning models against such attacks.
- Score: 5.333108060878682
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: The advent of multimodal deep learning models, such as CLIP, has unlocked new frontiers in a wide range of applications, from image-text understanding to classification tasks. However, these models are not safe for adversarial attacks, particularly backdoor attacks, which can subtly manipulate model behavior. Moreover, existing defense methods typically involve training from scratch or fine-tuning using a large dataset without pinpointing the specific labels that are affected. In this study, we introduce an innovative strategy to enhance the robustness of multimodal contrastive learning models against such attacks. In particular, given a poisoned CLIP model, our approach can identify the backdoor trigger and pinpoint the victim samples and labels in an efficient manner. To that end, an image segmentation ``oracle'' is introduced as the supervisor for the output of the poisoned CLIP. We develop two algorithms to rectify the poisoned model: (1) differentiating between CLIP and Oracle's knowledge to identify potential triggers; (2) pinpointing affected labels and victim samples, and curating a compact fine-tuning dataset. With this knowledge, we are allowed to rectify the poisoned CLIP model to negate backdoor effects. Extensive experiments on visual recognition benchmarks demonstrate our strategy is effective in CLIP-based backdoor defense.
Related papers
- Bridging the Task Gap: Multi-Task Adversarial Transferability in CLIP and Its Derivatives [61.58574200236532]
Adversarial examples generated from fine-grained tasks often exhibit stronger transfer potential than those from coarse-grained tasks.<n>We propose a novel framework, Multi-Task Adversarial CLIP (MT-AdvCLIP), which introduces a task-aware feature aggregation loss and generates perturbations with enhanced cross-task generalization capability.
arXiv Detail & Related papers (2025-09-28T14:46:52Z) - InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning [36.56302680556252]
We introduce InverTune, the first backdoor defense framework for multimodal models under minimal attacker assumptions.<n>InverTune effectively identifies and removes backdoor artifacts through three key components, achieving robust protection against backdoor attacks.<n> Experimental results show that InverTune reduces the average attack success rate (ASR) by 97.87% against the state-of-the-art (SOTA) attacks.
arXiv Detail & Related papers (2025-06-14T09:08:34Z) - Test-Time Multimodal Backdoor Detection by Contrastive Prompting [15.878513862121602]
multimodal contrastive learning methods (e.g., CLIP) are vulnerable to backdoor attacks.<n>We propose BDetCLIP, a novel test-time backdoor detection method based on contrastive prompting.<n>Our proposed BDetCLIP is superior to state-of-the-art backdoor detection methods, in terms of both effectiveness and efficiency.
arXiv Detail & Related papers (2024-05-24T06:52:54Z) - Unlearning Backdoor Threats: Enhancing Backdoor Defense in Multimodal Contrastive Learning via Local Token Unlearning [49.242828934501986]
Multimodal contrastive learning has emerged as a powerful paradigm for building high-quality features.
backdoor attacks subtly embed malicious behaviors within the model during training.
We introduce an innovative token-based localized forgetting training regime.
arXiv Detail & Related papers (2024-03-24T18:33:15Z) - Model X-ray:Detecting Backdoored Models via Decision Boundary [62.675297418960355]
Backdoor attacks pose a significant security vulnerability for deep neural networks (DNNs)
We propose Model X-ray, a novel backdoor detection approach based on the analysis of illustrated two-dimensional (2D) decision boundaries.
Our approach includes two strategies focused on the decision areas dominated by clean samples and the concentration of label distribution.
arXiv Detail & Related papers (2024-02-27T12:42:07Z) - BadCLIP: Trigger-Aware Prompt Learning for Backdoor Attacks on CLIP [55.33331463515103]
BadCLIP is built on a novel and effective mechanism in backdoor attacks on CLIP.
It consists of a learnable trigger applied to images and a trigger-aware context generator, such that the trigger can change text features via trigger-aware prompts.
arXiv Detail & Related papers (2023-11-26T14:24:13Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Influencer Backdoor Attack on Semantic Segmentation [39.57965442338681]
Influencer Backdoor Attack (IBA) is a backdoor attack on semantic segmentation models.
IBA is expected to maintain the classification accuracy of non-victim pixels and mislead classifications of all victim pixels in every single inference.
We introduce an innovative Pixel Random Labeling strategy which maintains optimal performance even when the trigger is placed far from the victim pixels.
arXiv Detail & Related papers (2023-03-21T17:45:38Z) - CleanCLIP: Mitigating Data Poisoning Attacks in Multimodal Contrastive
Learning [63.72975421109622]
CleanCLIP is a finetuning framework that weakens the learned spurious associations introduced by backdoor attacks.
CleanCLIP maintains model performance on benign examples while erasing a range of backdoor attacks on multimodal contrastive learning.
arXiv Detail & Related papers (2023-03-06T17:48:32Z) - Adversarial Attacks on Knowledge Graph Embeddings via Instance
Attribution Methods [8.793721044482613]
We study data poisoning attacks against Knowledge Graph Embeddings (KGE) models for link prediction.
These attacks craft adversarial additions or deletions at training time to cause model failure at test time.
We propose a method to replace one of the two entities in each influential triple to generate adversarial additions.
arXiv Detail & Related papers (2021-11-04T19:38:48Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.