Related papers: Defending Deep Neural Networks against Backdoor Attacks via Module Switching

Defending Deep Neural Networks against Backdoor Attacks via Module Switching

URL: http://arxiv.org/abs/2504.05902v1
Date: Tue, 08 Apr 2025 11:01:07 GMT
Title: Defending Deep Neural Networks against Backdoor Attacks via Module Switching
Authors: Weijun Li, Ansh Arora, Xuanli He, Mark Dras, Qiongkai Xu,
Abstract summary: An exponential increase in the parameters of Deep Neural Networks (DNNs) has significantly raised the cost of independent training.<n>Open-source models are more vulnerable to malicious threats, such as backdoor attacks.<n>We propose a novel module-switching strategy to break such spurious correlations within the model's propagation path.
Score: 15.979018992591032
License: http://creativecommons.org/licenses/by/4.0/
Abstract: The exponential increase in the parameters of Deep Neural Networks (DNNs) has significantly raised the cost of independent training, particularly for resource-constrained entities. As a result, there is a growing reliance on open-source models. However, the opacity of training processes exacerbates security risks, making these models more vulnerable to malicious threats, such as backdoor attacks, while simultaneously complicating defense mechanisms. Merging homogeneous models has gained attention as a cost-effective post-training defense. However, we notice that existing strategies, such as weight averaging, only partially mitigate the influence of poisoned parameters and remain ineffective in disrupting the pervasive spurious correlations embedded across model parameters. We propose a novel module-switching strategy to break such spurious correlations within the model's propagation path. By leveraging evolutionary algorithms to optimize fusion strategies, we validate our approach against backdoor attacks targeting text and vision domains. Our method achieves effective backdoor mitigation even when incorporating a couple of compromised models, e.g., reducing the average attack success rate (ASR) to 22% compared to 31.9% with the best-performing baseline on SST-2.

Related papers

Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
Towards Unified Robustness Against Both Backdoor and Adversarial Attacks [31.846262387360767]
Deep Neural Networks (DNNs) are known to be vulnerable to both backdoor and adversarial attacks. This paper reveals that there is an intriguing connection between backdoor and adversarial attacks. A novel Progressive Unified Defense algorithm is proposed to defend against backdoor and adversarial attacks simultaneously.
arXiv Detail & Related papers (2024-05-28T07:50:00Z)
Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning [20.69655306650485]
Federated Learning (FL) is a decentralized machine learning method that enables participants to collaboratively train a model without sharing their private data. Despite its privacy and scalability benefits, FL is susceptible to backdoor attacks. We propose DPOT, a backdoor attack strategy in FL that dynamically constructs backdoor objectives by optimizing a backdoor trigger.
arXiv Detail & Related papers (2024-05-10T02:44:25Z)
Here's a Free Lunch: Sanitizing Backdoored Models with Model Merge [17.3048898399324]
democratization of pre-trained language models through open-source initiatives has rapidly advanced innovation and expanded access to cutting-edge technologies. backdoor attacks, where hidden malicious behaviors are triggered by specific inputs, compromising natural language processing (NLP) system integrity and reliability. This paper suggests that merging a backdoored model with other homogeneous models can significantly remediate backdoor vulnerabilities.
arXiv Detail & Related papers (2024-02-29T16:37:08Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Shared Adversarial Unlearning: Backdoor Mitigation by Unlearning Shared Adversarial Examples [67.66153875643964]
Backdoor attacks are serious security threats to machine learning models. In this paper, we explore the task of purifying a backdoored model using a small clean dataset. By establishing the connection between backdoor risk and adversarial risk, we derive a novel upper bound for backdoor risk.
arXiv Detail & Related papers (2023-07-20T03:56:04Z)
Avoid Adversarial Adaption in Federated Learning by Multi-Metric Investigations [55.2480439325792]
Federated Learning (FL) facilitates decentralized machine learning model training, preserving data privacy, lowering communication costs, and boosting model performance through diversified data sources. FL faces vulnerabilities such as poisoning attacks, undermining model integrity with both untargeted performance degradation and targeted backdoor attacks. We define a new notion of strong adaptive adversaries, capable of adapting to multiple objectives simultaneously. MESAS is the first defense robust against strong adaptive adversaries, effective in real-world data scenarios, with an average overhead of just 24.37 seconds.
arXiv Detail & Related papers (2023-06-06T11:44:42Z)
Adversarial Robustness Assessment of NeuroEvolution Approaches [1.237556184089774]
We evaluate the robustness of models found by two NeuroEvolution approaches on the CIFAR-10 image classification task. Our results show that when the evolved models are attacked with iterative methods, their accuracy usually drops to, or close to, zero. Some of these techniques can exacerbate the perturbations added to the original inputs, potentially harming robustness.
arXiv Detail & Related papers (2022-07-12T10:40:19Z)
DeepSight: Mitigating Backdoor Attacks in Federated Learning Through Deep Model Inspection [26.593268413299228]
Federated Learning (FL) allows multiple clients to collaboratively train a Neural Network (NN) model on their private data without revealing the data. DeepSight is a novel model filtering approach for mitigating backdoor attacks. We show that it can mitigate state-of-the-art backdoor attacks with a negligible impact on the model's performance on benign data.
arXiv Detail & Related papers (2022-01-03T17:10:07Z)
Model-Agnostic Meta-Attack: Towards Reliable Evaluation of Adversarial Robustness [53.094682754683255]
We propose a Model-Agnostic Meta-Attack (MAMA) approach to discover stronger attack algorithms automatically. Our method learns the in adversarial attacks parameterized by a recurrent neural network. We develop a model-agnostic training algorithm to improve the ability of the learned when attacking unseen defenses.
arXiv Detail & Related papers (2021-10-13T13:54:24Z)
A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems. This paper proposes a self-supervised adversarial training mechanism in the input space. It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)

This list is automatically generated from the titles and abstracts of the papers in this site.