Backdoor Mitigation via Invertible Pruning Masks
- URL: http://arxiv.org/abs/2509.15497v2
- Date: Tue, 14 Oct 2025 23:39:40 GMT
- Title: Backdoor Mitigation via Invertible Pruning Masks
- Authors: Kealan Dunnett, Reza Arablouei, Dimity Miller, Volkan Dedeoglu, Raja Jurdak,
- Abstract summary: We propose a novel pruning approach featuring a learned emphselection mechanism to identify parameters critical to both main and backdoor tasks.<n>We formulate this as a bi-level optimization problem that jointly learns selection variables, a sparse invertible mask, and sample-specific backdoor perturbations.<n>Our approach outperforms existing pruning-based backdoor mitigation approaches, maintains strong performance under limited data conditions, and achieves competitive results compared to state-of-the-art fine-tuning approaches.
- Score: 10.393154496941527
- License: http://creativecommons.org/licenses/by/4.0/
- Abstract: Model pruning has gained traction as a promising defense strategy against backdoor attacks in deep learning. However, existing pruning-based approaches often fall short in accurately identifying and removing the specific parameters responsible for inducing backdoor behaviors. Despite the dominance of fine-tuning-based defenses in recent literature, largely due to their superior performance, pruning remains a compelling alternative, offering greater interpretability and improved robustness in low-data regimes. In this paper, we propose a novel pruning approach featuring a learned \emph{selection} mechanism to identify parameters critical to both main and backdoor tasks, along with an \emph{invertible} pruning mask designed to simultaneously achieve two complementary goals: eliminating the backdoor task while preserving it through the inverse mask. We formulate this as a bi-level optimization problem that jointly learns selection variables, a sparse invertible mask, and sample-specific backdoor perturbations derived from clean data. The inner problem synthesizes candidate triggers using the inverse mask, while the outer problem refines the mask to suppress backdoor behavior without impairing clean-task accuracy. Extensive experiments demonstrate that our approach outperforms existing pruning-based backdoor mitigation approaches, maintains strong performance under limited data conditions, and achieves competitive results compared to state-of-the-art fine-tuning approaches. Notably, the proposed approach is particularly effective in restoring correct predictions for compromised samples after successful backdoor mitigation.
Related papers
- Backdoor Unlearning by Linear Task Decomposition [69.91984435094157]
Foundation models are highly susceptible to adversarial perturbations and targeted backdoor attacks.<n>Existing backdoor removal approaches rely on costly fine-tuning to override the harmful behavior.<n>This raises the question of whether backdoors can be removed without compromising the general capabilities of the models.
arXiv Detail & Related papers (2025-10-16T16:18:07Z) - Injection, Attack and Erasure: Revocable Backdoor Attacks via Machine Unlearning [2.1896295740048894]
We introduce the first paradigm of revocable backdoor attacks, where the backdoor can be proactively and thoroughly removed after the attack objective is achieved.<n>This work opens a new direction for backdoor attack research and presents new challenges for the security of machine learning systems.
arXiv Detail & Related papers (2025-10-15T09:09:43Z) - Neural Antidote: Class-Wise Prompt Tuning for Purifying Backdoors in Pre-trained Vision-Language Models [42.81731204702258]
Class-wise Backdoor Prompt Tuning (CBPT) is an efficient and effective method that operates on the text prompts to indirectly purify poisoned Vision-Language Models (VLMs)<n>CBPT significantly mitigates backdoor threats while preserving model utility, e.g. an average Clean Accuracy (CA) of 58.86% and an Attack Success Rate (ASR) of 0.39% across seven mainstream backdoor attacks.
arXiv Detail & Related papers (2025-02-26T16:25:15Z) - REFINE: Inversion-Free Backdoor Defense via Model Reprogramming [60.554146386198376]
Backdoor attacks on deep neural networks (DNNs) have emerged as a significant security threat.<n>We propose REFINE, an inversion-free backdoor defense method based on model reprogramming.
arXiv Detail & Related papers (2025-02-22T07:29:12Z) - Defending Multimodal Backdoored Models by Repulsive Visual Prompt Tuning [19.638259197558625]
Multimodal contrastive learning models (e.g., CLIP) can learn high-quality representations from large-scale image-text datasets.<n>They exhibit significant vulnerabilities to backdoor attacks, raising serious safety concerns.<n>We propose Repulsive Visual Prompt Tuning (RVPT), a novel defense approach that employs deep visual prompt tuning with a specially designed feature-repelling loss.
arXiv Detail & Related papers (2024-12-29T08:09:20Z) - An Effective and Resilient Backdoor Attack Framework against Deep Neural Networks and Vision Transformers [22.77836113915616]
We propose a novel attention-based mask generation methodology that searches for the optimal trigger shape and location.<n>We also introduce a Quality-of-Experience term into the loss function and carefully adjust the transparency value of the trigger.<n>Our proposed backdoor attack framework also showcases robustness against state-of-the-art backdoor defenses.
arXiv Detail & Related papers (2024-12-09T02:03:27Z) - ProP: Efficient Backdoor Detection via Propagation Perturbation for Overparametrized Models [2.808880709778591]
Backdoor attacks pose significant challenges to the security of machine learning models.
We propose ProP, a novel and scalable backdoor detection method.
ProP operates with minimal assumptions, requiring no prior knowledge of triggers or malicious samples.
arXiv Detail & Related papers (2024-11-11T14:43:44Z) - Uncovering, Explaining, and Mitigating the Superficial Safety of Backdoor Defense [27.471096446155933]
We investigate the Post-Purification Robustness of current backdoor purification methods.
We find that current safety purification methods are vulnerable to the rapid re-learning of backdoor behavior.
We propose a tuning defense, Path-Aware Minimization (PAM), which promotes deviation along backdoor-connected paths with extra model updates.
arXiv Detail & Related papers (2024-10-13T13:37:36Z) - STBA: Towards Evaluating the Robustness of DNNs for Query-Limited Black-box Scenario [50.37501379058119]
We propose the Spatial Transform Black-box Attack (STBA) to craft formidable adversarial examples in the query-limited scenario.
We show that STBA could effectively improve the imperceptibility of the adversarial examples and remarkably boost the attack success rate under query-limited settings.
arXiv Detail & Related papers (2024-03-30T13:28:53Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Setting the Trap: Capturing and Defeating Backdoors in Pretrained
Language Models through Honeypots [68.84056762301329]
Recent research has exposed the susceptibility of pretrained language models (PLMs) to backdoor attacks.
We propose and integrate a honeypot module into the original PLM to absorb backdoor information exclusively.
Our design is motivated by the observation that lower-layer representations in PLMs carry sufficient backdoor features.
arXiv Detail & Related papers (2023-10-28T08:21:16Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.