Related papers: Backdoor Vectors: a Task Arithmetic View on Backdoor Attacks and Defenses

Backdoor Vectors: a Task Arithmetic View on Backdoor Attacks and Defenses

URL: http://arxiv.org/abs/2510.08016v1
Date: Thu, 09 Oct 2025 09:54:05 GMT
Title: Backdoor Vectors: a Task Arithmetic View on Backdoor Attacks and Defenses
Authors: Stanisław Pawlak, Jan Dubiński, Daniel Marczak, Bartłomiej Twardowski,
Abstract summary: We propose a simple framework for understanding backdoor attacks by treating the attack itself as a task vector.<n>We propose a novel method that enhances backdoor resilience through merging dubbed $Sparse Backdoor Vector (SBV)$ that combines multiple attacks into a single one.<n>Our results show that SBVs surpass prior attacks and is the first method to leverage merging to improve backdoor effectiveness.
Score: 2.82789527790216
License: http://creativecommons.org/licenses/by/4.0/
Abstract: Model merging (MM) recently emerged as an effective method for combining large deep learning models. However, it poses significant security risks. Recent research shows that it is highly susceptible to backdoor attacks, which introduce a hidden trigger into a single fine-tuned model instance that allows the adversary to control the output of the final merged model at inference time. In this work, we propose a simple framework for understanding backdoor attacks by treating the attack itself as a task vector. $Backdoor\ Vector\ (BV)$ is calculated as the difference between the weights of a fine-tuned backdoored model and fine-tuned clean model. BVs reveal new insights into attacks understanding and a more effective framework to measure their similarity and transferability. Furthermore, we propose a novel method that enhances backdoor resilience through merging dubbed $Sparse\ Backdoor\ Vector\ (SBV)$ that combines multiple attacks into a single one. We identify the core vulnerability behind backdoor threats in MM: $inherent\ triggers$ that exploit adversarial weaknesses in the base model. To counter this, we propose $Injection\ BV\ Subtraction\ (IBVS)$ - an assumption-free defense against backdoors in MM. Our results show that SBVs surpass prior attacks and is the first method to leverage merging to improve backdoor effectiveness. At the same time, IBVS provides a lightweight, general defense that remains effective even when the backdoor threat is entirely unknown.

Related papers

Data Free Backdoor Attacks [83.10379074100453]
DFBA is a retraining-free and data-free backdoor attack without changing the model architecture.<n>We verify that our injected backdoor is provably undetectable and unchosen by various state-of-the-art defenses.<n>Our evaluation on multiple datasets demonstrates that our injected backdoor: 1) incurs negligible classification loss, 2) achieves 100% attack success rates, and 3) bypasses six existing state-of-the-art defenses.
arXiv Detail & Related papers (2024-12-09T05:30:25Z)
Efficient Backdoor Defense in Multimodal Contrastive Learning: A Token-Level Unlearning Method for Mitigating Threats [52.94388672185062]
We propose an efficient defense mechanism against backdoor threats using a concept known as machine unlearning. This entails strategically creating a small set of poisoned samples to aid the model's rapid unlearning of backdoor vulnerabilities. In the backdoor unlearning process, we present a novel token-based portion unlearning training regime.
arXiv Detail & Related papers (2024-09-29T02:55:38Z)
A Spatiotemporal Stealthy Backdoor Attack against Cooperative Multi-Agent Deep Reinforcement Learning [12.535344011523897]
cooperative multi-agent deep reinforcement learning (c-MADRL) is under the threat of backdoor attacks. We propose a novel backdoor attack against c-MADRL, which attacks entire multi-agent team by embedding backdoor only in one agent. Our backdoor attacks are able to reach a high attack success rate (91.6%) while maintaining a low clean performance variance rate (3.7%)
arXiv Detail & Related papers (2024-09-12T06:17:37Z)
Towards Unified Robustness Against Both Backdoor and Adversarial Attacks [31.846262387360767]
Deep Neural Networks (DNNs) are known to be vulnerable to both backdoor and adversarial attacks. This paper reveals that there is an intriguing connection between backdoor and adversarial attacks. A novel Progressive Unified Defense algorithm is proposed to defend against backdoor and adversarial attacks simultaneously.
arXiv Detail & Related papers (2024-05-28T07:50:00Z)
TrojFM: Resource-efficient Backdoor Attacks against Very Large Foundation Models [69.37990698561299]
TrojFM is a novel backdoor attack tailored for very large foundation models. Our approach injects backdoors by fine-tuning only a very small proportion of model parameters. We demonstrate that TrojFM can launch effective backdoor attacks against widely used large GPT-style models.
arXiv Detail & Related papers (2024-05-27T03:10:57Z)
Breaking the False Sense of Security in Backdoor Defense through Re-Activation Attack [32.74007523929888]
We re-investigate the characteristics of backdoored models after defense. We find that the original backdoors still exist in defense models derived from existing post-training defense strategies. We empirically show that these dormant backdoors can be easily re-activated during inference.
arXiv Detail & Related papers (2024-05-25T08:57:30Z)
Mitigating Backdoor Attack by Injecting Proactive Defensive Backdoor [63.84477483795964]
Data-poisoning backdoor attacks are serious security threats to machine learning models. In this paper, we focus on in-training backdoor defense, aiming to train a clean model even when the dataset may be potentially poisoned. We propose a novel defense approach called PDB (Proactive Defensive Backdoor)
arXiv Detail & Related papers (2024-05-25T07:52:26Z)
BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses. We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z)
Backdoor Attack with Sparse and Invisible Trigger [57.41876708712008]
Deep neural networks (DNNs) are vulnerable to backdoor attacks. backdoor attack is an emerging yet threatening training-phase threat. We propose a sparse and invisible backdoor attack (SIBA)
arXiv Detail & Related papers (2023-05-11T10:05:57Z)
Backdoors Stuck At The Frontdoor: Multi-Agent Backdoor Attacks That Backfire [8.782809316491948]
We investigate a multi-agent backdoor attack scenario, where multiple attackers attempt to backdoor a victim model simultaneously. A consistent backfiring phenomenon is observed across a wide range of games, where agents suffer from a low collective attack success rate. The results motivate the re-evaluation of backdoor defense research for practical environments.
arXiv Detail & Related papers (2022-01-28T16:11:40Z)
Rethink Stealthy Backdoor Attacks in Natural Language Processing [35.6803390044542]
The capacity of stealthy backdoor attacks is overestimated when categorized as backdoor attacks. We propose a new metric called attack successful rate difference (ASRD), which measures the ASR difference between clean state and poison state models. Our method achieves significantly better performance than state-of-the-art defense methods against stealthy backdoor attacks.
arXiv Detail & Related papers (2022-01-09T12:34:12Z)

This list is automatically generated from the titles and abstracts of the papers in this site.