ProtegoFed: Backdoor-Free Federated Instruction Tuning with Interspersed Poisoned Data
- URL: http://arxiv.org/abs/2603.00516v1
- Date: Sat, 28 Feb 2026 07:25:32 GMT
- Title: ProtegoFed: Backdoor-Free Federated Instruction Tuning with Interspersed Poisoned Data
- Authors: Haodong Zhao, Jinming Hu, Zhaomin Wu, Zongru Wu, Wei Du, Junyi Hou, Caibei Zhao, Zhuosheng Zhang, Bingsheng He, Gongshen Liu,
- Abstract summary: Federated Instruction Tuning (FIT) enables collaborative instruction tuning of large language models across multiple organizations (clients) in a cross-silo setting without requiring the sharing of private instructions.<n>Recent findings suggest that poisoned samples may be pervasive and inadvertently embedded in real-world datasets, potentially distributed across all clients, even if the clients are benign.<n>This paper introduces ProtegoFed, the first backdoor-free FIT framework that accurately detects, purifies, and even interspersed poisoned data across clients during the training.
- Score: 50.142067708131826
- License: http://arxiv.org/licenses/nonexclusive-distrib/1.0/
- Abstract: Federated Instruction Tuning (FIT) enables collaborative instruction tuning of large language models across multiple organizations (clients) in a cross-silo setting without requiring the sharing of private instructions. Recent findings on natural backdoors and the existing training data collection method suggest that poisoned samples may be pervasive and inadvertently embedded in real-world datasets, potentially distributed across all clients, even if the clients are benign. This work systematically examine this threat in FIT, demonstrating that existing defenses are ineffective when poisoned data is interspersed among all clients. Addressing this challenge entails two major difficulties: identifying the distinctive characteristics of poisoned samples at each client and enabling collaborative defense when some clients are heavily dominated by poisoned samples. To address these difficulties, we identify gradients in the frequency domain as a robust signal to distinguish poisoned data. We further propose a global secondary clustering mechanism that facilitates collaborative identification of poisoned samples across clients. In summary, this paper introduces ProtegoFed, the first backdoor-free FIT framework that accurately detects, removes, and even purifies interspersed poisoned data across clients during the training. Experimental results on four FL datasets show that ProtegoFed identifies $92.00\% \sim 100.00\%$ of poisoned samples, reduces the attack success rate to almost zero, and maintains utility on the main task. Code is available at https://github.com/dongdongzhaoUP/ProtegoFed.
Related papers
- FedUP: Efficient Pruning-based Federated Unlearning for Model Poisoning Attacks [4.146693948527548]
Federated Unlearning (FU) is emerging as a solution to address such vulnerabilities.<n>This work presents FedUP, a lightweight FU algorithm designed to efficiently mitigate malicious clients' influence.<n>In all scenarios, FedUP reduces malicious influence, lowering accuracy on malicious data to match that of a model retrained from scratch.
arXiv Detail & Related papers (2025-08-19T14:14:58Z) - A Client-level Assessment of Collaborative Backdoor Poisoning in Non-IID Federated Learning [14.728868104566363]
Federated learning (FL) enables collaborative model training using decentralized private data from multiple clients.<n>Our research reveals new vulnerabilities stemming from non-independent and identically distributed (non-IID) data among clients.<n>We develop a novel collaborative backdoor poisoning attack called CollaPois.
arXiv Detail & Related papers (2025-04-17T12:03:02Z) - FedNIA: Noise-Induced Activation Analysis for Mitigating Data Poisoning in FL [6.144680854063938]
Federated Noise-Induced Activation Analysis (FedNIA) is a novel defense framework to identify and exclude adversarial clients.<n>FedNIA injects random noise inputs to analyze the layerwise activation patterns in client models.<n>It can defend against diverse attack types, including sample poisoning, label flipping, and backdoors.
arXiv Detail & Related papers (2025-02-23T01:16:01Z) - Protecting Model Adaptation from Trojans in the Unlabeled Data [120.42853706967188]
This paper explores the potential trojan attacks on model adaptation launched by well-designed poisoning target data.<n>We propose a plug-and-play method named DiffAdapt, which can be seamlessly integrated with existing adaptation algorithms.
arXiv Detail & Related papers (2024-01-11T16:42:10Z) - Client-side Gradient Inversion Against Federated Learning from Poisoning [59.74484221875662]
Federated Learning (FL) enables distributed participants to train a global model without sharing data directly to a central server.
Recent studies have revealed that FL is vulnerable to gradient inversion attack (GIA), which aims to reconstruct the original training samples.
We propose Client-side poisoning Gradient Inversion (CGI), which is a novel attack method that can be launched from clients.
arXiv Detail & Related papers (2023-09-14T03:48:27Z) - Mitigating Cross-client GANs-based Attack in Federated Learning [78.06700142712353]
Multi distributed multimedia clients can resort to federated learning (FL) to jointly learn a global shared model.
FL suffers from the cross-client generative adversarial networks (GANs)-based (C-GANs) attack.
We propose Fed-EDKD technique to improve the current popular FL schemes to resist C-GANs attack.
arXiv Detail & Related papers (2023-07-25T08:15:55Z) - G$^2$uardFL: Safeguarding Federated Learning Against Backdoor Attacks
through Attributed Client Graph Clustering [116.4277292854053]
Federated Learning (FL) offers collaborative model training without data sharing.
FL is vulnerable to backdoor attacks, where poisoned model weights lead to compromised system integrity.
We present G$2$uardFL, a protective framework that reinterprets the identification of malicious clients as an attributed graph clustering problem.
arXiv Detail & Related papers (2023-06-08T07:15:04Z) - FLIP: A Provable Defense Framework for Backdoor Mitigation in Federated
Learning [66.56240101249803]
We study how hardening benign clients can affect the global model (and the malicious clients)
We propose a trigger reverse engineering based defense and show that our method can achieve improvement with guarantee robustness.
Our results on eight competing SOTA defense methods show the empirical superiority of our method on both single-shot and continuous FL backdoor attacks.
arXiv Detail & Related papers (2022-10-23T22:24:03Z) - CrowdGuard: Federated Backdoor Detection in Federated Learning [39.58317527488534]
This paper presents a novel defense mechanism, CrowdGuard, that effectively mitigates backdoor attacks in Federated Learning.
CrowdGuard employs a server-located stacked clustering scheme to enhance its resilience to rogue client feedback.
The evaluation results demonstrate that CrowdGuard achieves a 100% True-Positive-Rate and True-Negative-Rate across various scenarios.
arXiv Detail & Related papers (2022-10-14T11:27:49Z) - FLCert: Provably Secure Federated Learning against Poisoning Attacks [67.8846134295194]
We propose FLCert, an ensemble federated learning framework that is provably secure against poisoning attacks.
Our experiments show that the label predicted by our FLCert for a test input is provably unaffected by a bounded number of malicious clients.
arXiv Detail & Related papers (2022-10-02T17:50:04Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.