BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning
- URL: http://arxiv.org/abs/2602.17168v1
- Date: Thu, 19 Feb 2026 08:31:16 GMT
- Title: BadCLIP++: Stealthy and Persistent Backdoors in Multimodal Contrastive Learning
- Authors: Siyuan Liang, Yongcheng Jing, Yingjie Wang, Jiaxing Huang, Ee-chien Chang, Dacheng Tao,
- Abstract summary: Research on backdoor attacks against multimodal contrastive learning models faces two key challenges: stealthiness and persistence.<n>We propose BadCLIP++, a unified framework that tackles both challenges.<n>For stealthiness, we introduce a semantic-fusion QR micro-trigger that embeds imperceptible patterns near task-relevant regions.<n>For persistence, we stabilize trigger embeddings via radius shrinkage and centroid alignment.
- Score: 73.46118996284888
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Research on backdoor attacks against multimodal contrastive learning models faces two key challenges: stealthiness and persistence. Existing methods often fail under strong detection or continuous fine-tuning, largely due to (1) cross-modal inconsistency that exposes trigger patterns and (2) gradient dilution at low poisoning rates that accelerates backdoor forgetting. These coupled causes remain insufficiently modeled and addressed. We propose BadCLIP++, a unified framework that tackles both challenges. For stealthiness, we introduce a semantic-fusion QR micro-trigger that embeds imperceptible patterns near task-relevant regions, preserving clean-data statistics while producing compact trigger distributions. We further apply target-aligned subset selection to strengthen signals at low injection rates. For persistence, we stabilize trigger embeddings via radius shrinkage and centroid alignment, and stabilize model parameters through curvature control and elastic weight consolidation, maintaining solutions within a low-curvature wide basin resistant to fine-tuning. We also provide the first theoretical analysis showing that, within a trust region, gradients from clean fine-tuning and backdoor objectives are co-directional, yielding a non-increasing upper bound on attack success degradation. Experiments demonstrate that with only 0.3% poisoning, BadCLIP++ achieves 99.99% attack success rate (ASR) in digital settings, surpassing baselines by 11.4 points. Across nineteen defenses, ASR remains above 99.90% with less than 0.8% drop in clean accuracy. The method further attains 65.03% success in physical attacks and shows robustness against watermark removal defenses.
Related papers
- CS-GBA: A Critical Sample-based Gradient-guided Backdoor Attack for Offline Reinforcement Learning [7.5200963577855875]
Offline Reinforcement Learning (RL) enables policy optimization from static datasets but is inherently vulnerable to backdoor attacks.<n>We propose CS-GBA (Critical Sample-based Gradient-guided Backdoor Attack), a novel framework designed to achieve high stealthiness and destructiveness under a strict budget.
arXiv Detail & Related papers (2026-01-15T13:57:52Z) - The Eminence in Shadow: Exploiting Feature Boundary Ambiguity for Robust Backdoor Attacks [51.468144272905135]
Deep neural networks (DNNs) underpin critical applications yet remain vulnerable to backdoor attacks.<n>We provide a theoretical analysis targeting backdoor attacks, focusing on how sparse decision boundaries enable disproportionate model manipulation.<n>We propose Eminence, an explainable and robust black-box backdoor framework with provable theoretical guarantees and inherent stealth properties.
arXiv Detail & Related papers (2025-12-11T08:09:07Z) - Towards Stealthy and Effective Backdoor Attacks on Lane Detection: A Naturalistic Data Poisoning Approach [21.709351855331594]
Deep learning-based lane detection plays a critical role in autonomous driving and driver assistance systems.<n>Existing backdoor attack methods on LD often exhibit limited practical utility due to the artificial and conspicuous nature of their triggers.<n>We introduce DBALD, a novel diffusion-based data poisoning framework for generating naturalistic backdoor triggers.
arXiv Detail & Related papers (2025-08-04T07:13:18Z) - InverTune: Removing Backdoors from Multimodal Contrastive Learning Models via Trigger Inversion and Activation Tuning [36.56302680556252]
We introduce InverTune, the first backdoor defense framework for multimodal models under minimal attacker assumptions.<n>InverTune effectively identifies and removes backdoor artifacts through three key components, achieving robust protection against backdoor attacks.<n> Experimental results show that InverTune reduces the average attack success rate (ASR) by 97.87% against the state-of-the-art (SOTA) attacks.
arXiv Detail & Related papers (2025-06-14T09:08:34Z) - Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in CLIP [51.04452017089568]
Class-wise Backdoor Prompt Tuning (CBPT) is an efficient and effective defense mechanism that operates on text prompts to indirectly purify CLIP.<n>CBPT significantly mitigates backdoor threats while preserving model utility.
arXiv Detail & Related papers (2025-02-26T16:25:15Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - DiffAttack: Evasion Attacks Against Diffusion-Based Adversarial
Purification [63.65630243675792]
Diffusion-based purification defenses leverage diffusion models to remove crafted perturbations of adversarial examples.
Recent studies show that even advanced attacks cannot break such defenses effectively.
We propose a unified framework DiffAttack to perform effective and efficient attacks against diffusion-based purification defenses.
arXiv Detail & Related papers (2023-10-27T15:17:50Z) - Imbalanced Gradients: A Subtle Cause of Overestimated Adversarial
Robustness [75.30116479840619]
In this paper, we identify a more subtle situation called Imbalanced Gradients that can also cause overestimated adversarial robustness.
The phenomenon of imbalanced gradients occurs when the gradient of one term of the margin loss dominates and pushes the attack towards a suboptimal direction.
We propose a Margin Decomposition (MD) attack that decomposes a margin loss into individual terms and then explores the attackability of these terms separately.
arXiv Detail & Related papers (2020-06-24T13:41:37Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.