Related papers: CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation

CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation

URL: http://arxiv.org/abs/2507.05113v2
Date: Fri, 25 Jul 2025 06:33:41 GMT
Title: CLIP-Guided Backdoor Defense through Entropy-Based Poisoned Dataset Separation
Authors: Binyan Xu, Fan Yang, Xilin Dai, Di Tang, Kehuan Zhang,
Abstract summary: Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where adversaries poison training data to implant backdoor into victim model.<n>Current backdoor defenses on poisoned data often suffer from high computational costs or low effectiveness against advanced attacks like clean-label and clean-image backdoors.<n>We introduce CLIP-Guided backdoor Defense (CGD), an efficient and effective method that mitigates various backdoor attacks.
Score: 10.162187097557576
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Deep Neural Networks (DNNs) are susceptible to backdoor attacks, where adversaries poison training data to implant backdoor into the victim model. Current backdoor defenses on poisoned data often suffer from high computational costs or low effectiveness against advanced attacks like clean-label and clean-image backdoors. To address them, we introduce CLIP-Guided backdoor Defense (CGD), an efficient and effective method that mitigates various backdoor attacks. CGD utilizes a publicly accessible CLIP model to identify inputs that are likely to be clean or poisoned. It then retrains the model with these inputs, using CLIP's logits as a guidance to effectively neutralize the backdoor. Experiments on 4 datasets and 11 attack types demonstrate that CGD reduces attack success rates (ASRs) to below 1% while maintaining clean accuracy (CA) with a maximum drop of only 0.3%, outperforming existing defenses. Additionally, we show that clean-data-based defenses can be adapted to poisoned data using CGD. Also, CGD exhibits strong robustness, maintaining low ASRs even when employing a weaker CLIP model or when CLIP itself is compromised by a backdoor. These findings underscore CGD's exceptional efficiency, effectiveness, and applicability for real-world backdoor defense scenarios. Code: https://github.com/binyxu/CGD.

Related papers

BeDKD: Backdoor Defense based on Dynamic Knowledge Distillation and Directional Mapping Modulator [9.581510737256389]
We propose a novel Backdoor defense method based on Directional mapping module and adversarial Knowledge Distillation (BeDKD)<n>BeDKD surpasses the state-of-the-art defenses and reduces the ASR by 98% without significantly reducing the CACC.
arXiv Detail & Related papers (2025-08-03T05:28:01Z)
Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in CLIP [51.04452017089568]
Class-wise Backdoor Prompt Tuning (CBPT) is an efficient and effective defense mechanism that operates on text prompts to indirectly purify CLIP.<n>CBPT significantly mitigates backdoor threats while preserving model utility.
arXiv Detail & Related papers (2025-02-26T16:25:15Z)
Data Free Backdoor Attacks [83.10379074100453]
DFBA is a retraining-free and data-free backdoor attack without changing the model architecture.<n>We verify that our injected backdoor is provably undetectable and unchosen by various state-of-the-art defenses.<n>Our evaluation on multiple datasets demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses.
arXiv Detail & Related papers (2024-12-09T05:30:25Z)
DMGNN: Detecting and Mitigating Backdoor Attacks in Graph Neural Networks [30.766013737094532]
We propose DMGNN against out-of-distribution (OOD) and in-distribution (ID) graph backdoor attacks. DMGNN can easily identify the hidden ID and OOD triggers via predicting label transitions based on counterfactual explanation. DMGNN far outperforms the state-of-the-art (SOTA) defense methods, reducing the attack success rate to 5% with almost negligible degradation in model performance.
arXiv Detail & Related papers (2024-10-18T01:08:03Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z)
Beating Backdoor Attack at Its Own Game [10.106197319676294]
Deep neural networks (DNNs) are vulnerable to backdoor attack.<n>Existing defense methods have greatly reduced attack success rate.<n>We propose a highly effective framework which injects non-adversarial backdoors targeting poisoned samples.
arXiv Detail & Related papers (2023-07-28T13:07:42Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification. CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z)
CorruptEncoder: Data Poisoning based Backdoor Attacks to Contrastive Learning [71.25518220297639]
Contrastive learning pre-trains general-purpose encoders using an unlabeled pre-training dataset. DPBAs inject poisoned inputs into the pre-training dataset so the encoder is backdoored. CorruptEncoder introduces a new attack strategy to create poisoned inputs and uses a theory-guided method to maximize attack effectiveness. Our results show that our defense can reduce the effectiveness of DPBAs, but it sacrifices the utility of the encoder, highlighting the need for new defenses.
arXiv Detail & Related papers (2022-11-15T15:48:28Z)
Model-Contrastive Learning for Backdoor Defense [13.781375023320981]
We propose a novel backdoor defense method named MCL based on model-contrastive learning. MCL is more effective for reducing backdoor threats while maintaining higher accuracy of benign data.
arXiv Detail & Related papers (2022-05-09T16:36:46Z)
Black-box Detection of Backdoor Attacks with Limited Information and Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.