Hammer and Anvil: A Principled Defense Against Backdoors in Federated Learning
- URL: http://arxiv.org/abs/2509.08089v1
- Date: Tue, 09 Sep 2025 18:54:31 GMT
- Title: Hammer and Anvil: A Principled Defense Against Backdoors in Federated Learning
- Authors: Lucas Fenaux, Zheng Wang, Jacob Yan, Nathan Chung, Florian Kerschbaum,
- Abstract summary: Federated Learning is a distributed learning technique in which multiple clients cooperate to train a machine learning model.<n>In this work, we first devise a new adaptive adversary that surpasses existing adversaries in capabilities.<n>Then, we present Hammer and Anvil, a principled defense approach that combines two defenses in their underlying principle.
- Score: 19.849567299082306
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated Learning is a distributed learning technique in which multiple clients cooperate to train a machine learning model. Distributed settings facilitate backdoor attacks by malicious clients, who can embed malicious behaviors into the model during their participation in the training process. These malicious behaviors are activated during inference by a specific trigger. No defense against backdoor attacks has stood the test of time, especially against adaptive attackers, a powerful but not fully explored category of attackers. In this work, we first devise a new adaptive adversary that surpasses existing adversaries in capabilities, yielding attacks that only require one or two malicious clients out of 20 to break existing state-of-the-art defenses. Then, we present Hammer and Anvil, a principled defense approach that combines two defenses orthogonal in their underlying principle to produce a combined defense that, given the right set of parameters, must succeed against any attack. We show that our best combined defense, Krum+, is successful against our new adaptive adversary and state-of-the-art attacks.
Related papers
- The Attacker Moves Second: Stronger Adaptive Attacks Bypass Defenses Against Llm Jailbreaks and Prompt Injections [74.60337113759313]
Current defenses against jailbreaks and prompt injections are typically evaluated against a static set of harmful attack strings.<n>We argue that this evaluation process is flawed. Instead, we should evaluate defenses against adaptive attackers who explicitly modify their attack strategy to counter a defense's design.
arXiv Detail & Related papers (2025-10-10T05:51:04Z) - Chasing Moving Targets with Online Self-Play Reinforcement Learning for Safer Language Models [55.28518567702213]
Conventional language model (LM) safety alignment relies on a reactive, disjoint procedure: attackers exploit a static model, followed by defensive fine-tuning to patch exposed vulnerabilities.<n>This sequential approach creates a mismatch -- attackers overfit to obsolete defenses, while defenders perpetually lag behind emerging threats.<n>We propose Self-RedTeam, an online self-play reinforcement learning algorithm where an attacker and defender agent co-evolve through continuous interaction.
arXiv Detail & Related papers (2025-06-09T06:35:12Z) - Client-Side Patching against Backdoor Attacks in Federated Learning [0.0]
Federated learning is vulnerable to backdoor attacks launched by malicious participants.<n>We propose a novel defense mechanism for federated learning systems designed to mitigate backdoor attacks on the clients-side.<n>Our approach leverages adversarial learning techniques and model patching to neutralize the impact of backdoor attacks.
arXiv Detail & Related papers (2024-12-13T23:17:10Z) - On the Difficulty of Defending Contrastive Learning against Backdoor
Attacks [58.824074124014224]
We show how contrastive backdoor attacks operate through distinctive mechanisms.
Our findings highlight the need for defenses tailored to the specificities of contrastive backdoor attacks.
arXiv Detail & Related papers (2023-12-14T15:54:52Z) - BadCLIP: Dual-Embedding Guided Backdoor Attack on Multimodal Contrastive
Learning [85.2564206440109]
This paper reveals the threats in this practical scenario that backdoor attacks can remain effective even after defenses.
We introduce the emphtoolns attack, which is resistant to backdoor detection and model fine-tuning defenses.
arXiv Detail & Related papers (2023-11-20T02:21:49Z) - Learning to Backdoor Federated Learning [9.046972927978997]
In a federated learning (FL) system, malicious participants can easily embed backdoors into the aggregated model.
We propose a general reinforcement learning-based backdoor attack framework.
Our framework is both adaptive and flexible and achieves strong attack performance and durability even under state-of-the-art defenses.
arXiv Detail & Related papers (2023-03-06T17:47:04Z) - On the Effectiveness of Adversarial Training against Backdoor Attacks [111.8963365326168]
A backdoored model always predicts a target class in the presence of a predefined trigger pattern.
In general, adversarial training is believed to defend against backdoor attacks.
We propose a hybrid strategy which provides satisfactory robustness across different backdoor attacks.
arXiv Detail & Related papers (2022-02-22T02:24:46Z) - Certified Federated Adversarial Training [3.474871319204387]
We tackle the scenario of securing FL systems conducting adversarial training when a quorum of workers could be completely malicious.
We model an attacker who poisons the model to insert a weakness into the adversarial training such that the model displays apparent adversarial robustness.
We show that this defence can preserve adversarial robustness even against an adaptive attacker.
arXiv Detail & Related papers (2021-12-20T13:40:20Z) - Widen The Backdoor To Let More Attackers In [24.540853975732922]
We investigate the scenario of a multi-agent backdoor attack, where multiple non-colluding attackers craft and insert triggered samples in a shared dataset.
We discover a clear backfiring phenomenon: increasing the number of attackers shrinks each attacker's attack success rate.
We then exploit this phenomenon to minimize the collective ASR of attackers and maximize defender's robustness accuracy.
arXiv Detail & Related papers (2021-10-09T13:53:57Z) - What Doesn't Kill You Makes You Robust(er): Adversarial Training against
Poisons and Backdoors [57.040948169155925]
We extend the adversarial training framework to defend against (training-time) poisoning and backdoor attacks.
Our method desensitizes networks to the effects of poisoning by creating poisons during training and injecting them into training batches.
We show that this defense withstands adaptive attacks, generalizes to diverse threat models, and incurs a better performance trade-off than previous defenses.
arXiv Detail & Related papers (2021-02-26T17:54:36Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.