Related papers: BackdoorIndicator: Leveraging OOD Data for Proactive Backdoor Detection in Federated Learning

BackdoorIndicator: Leveraging OOD Data for Proactive Backdoor Detection in Federated Learning

URL: http://arxiv.org/abs/2405.20862v1
Date: Fri, 31 May 2024 14:44:57 GMT
Title: BackdoorIndicator: Leveraging OOD Data for Proactive Backdoor Detection in Federated Learning
Authors: Songze Li, Yanbo Dai,
Abstract summary: In a federated learning (FL) system, decentralized data owners (clients) could upload their locally trained models to a central server, to jointly train a global model. Malicious clients may plant backdoors into the global model through uploading poisoned local models, causing misclassification to a target class when encountering attacker-defined triggers. Existing backdoor defenses show inconsistent performance under different system and adversarial settings, especially when the malicious updates are made statistically close to the benign ones. We propose a novel proactive backdoor detection mechanism for FL named BackdoorIndicator, which has the server inject indicator tasks into the global model leveraging out-
Score: 7.528642177161784
License: http://creativecommons.org/licenses/by-nc-nd/4.0/
Abstract: In a federated learning (FL) system, decentralized data owners (clients) could upload their locally trained models to a central server, to jointly train a global model. Malicious clients may plant backdoors into the global model through uploading poisoned local models, causing misclassification to a target class when encountering attacker-defined triggers. Existing backdoor defenses show inconsistent performance under different system and adversarial settings, especially when the malicious updates are made statistically close to the benign ones. In this paper, we first reveal the fact that planting subsequent backdoors with the same target label could significantly help to maintain the accuracy of previously planted backdoors, and then propose a novel proactive backdoor detection mechanism for FL named BackdoorIndicator, which has the server inject indicator tasks into the global model leveraging out-of-distribution (OOD) data, and then utilizing the fact that any backdoor samples are OOD samples with respect to benign samples, the server, who is completely agnostic of the potential backdoor types and target labels, can accurately detect the presence of backdoors in uploaded models, via evaluating the indicator tasks. We perform systematic and extensive empirical studies to demonstrate the consistently superior performance and practicality of BackdoorIndicator over baseline defenses, across a wide range of system and adversarial settings.

Related papers

TrojanDam: Detection-Free Backdoor Defense in Federated Learning through Proactive Model Robustification utilizing OOD Data [9.078056680217465]
decentralization allows adversaries to craft carefully designed backdoor updates to misclassify a global model. Existing defense mechanisms mainly rely on post-training detection after receiving updates. We propose a backdoor defense paradigm, which focuses on proactive robustification on the global model.
arXiv Detail & Related papers (2025-04-22T07:56:51Z)
Lie Detector: Unified Backdoor Detection via Cross-Examination Framework [68.45399098884364]
We propose a unified backdoor detection framework in the semi-honest setting. Our method achieves superior detection performance, improving accuracy by 5.4%, 1.6%, and 11.9% over SoTA baselines. Notably, it is the first to effectively detect backdoors in multimodal large language models.
arXiv Detail & Related papers (2025-03-21T06:12:06Z)
Rethinking Backdoor Detection Evaluation for Language Models [45.34806299803778]
Backdoor attacks pose a major security risk for practitioners who depend on publicly released language models. Backdoor detection methods aim to detect whether a released model contains a backdoor, so that practitioners can avoid such vulnerabilities. While existing backdoor detection methods have high accuracy in detecting backdoored models on standard benchmarks, it is unclear whether they can robustly identify backdoors in the wild.
arXiv Detail & Related papers (2024-08-31T09:19:39Z)
Concealing Backdoor Model Updates in Federated Learning by Trigger-Optimized Data Poisoning [20.69655306650485]
Federated Learning (FL) is a decentralized machine learning method that enables participants to collaboratively train a model without sharing their private data. Despite its privacy and scalability benefits, FL is susceptible to backdoor attacks. We propose DPOT, a backdoor attack strategy in FL that dynamically constructs backdoor objectives by optimizing a backdoor trigger.
arXiv Detail & Related papers (2024-05-10T02:44:25Z)
Model Pairing Using Embedding Translation for Backdoor Attack Detection on Open-Set Classification Tasks [63.269788236474234]
We propose to use model pairs on open-set classification tasks for detecting backdoors. We show that this score, can be an indicator for the presence of a backdoor despite models being of different architectures. This technique allows for the detection of backdoors on models designed for open-set classification tasks, which is little studied in the literature.
arXiv Detail & Related papers (2024-02-28T21:29:16Z)
Backdoor Defense via Deconfounded Representation Learning [17.28760299048368]
We propose a Causality-inspired Backdoor Defense (CBD) to learn deconfounded representations for reliable classification. CBD is effective in reducing backdoor threats while maintaining high accuracy in predicting benign samples.
arXiv Detail & Related papers (2023-03-13T02:25:59Z)
FreeEagle: Detecting Complex Neural Trojans in Data-Free Cases [50.065022493142116]
Trojan attack on deep neural networks, also known as backdoor attack, is a typical threat to artificial intelligence. FreeEagle is the first data-free backdoor detection method that can effectively detect complex backdoor attacks.
arXiv Detail & Related papers (2023-02-28T11:31:29Z)
BayBFed: Bayesian Backdoor Defense for Federated Learning [17.433543798151746]
Federated learning (FL) allows participants to jointly train a machine learning model without sharing their private data with others. BayBFed proposes to utilize probability distributions over client updates to detect malicious updates in FL.
arXiv Detail & Related papers (2023-01-23T16:01:30Z)
Untargeted Backdoor Attack against Object Detection [69.63097724439886]
We design a poison-only backdoor attack in an untargeted manner, based on task characteristics. We show that, once the backdoor is embedded into the target model by our attack, it can trick the model to lose detection of any object stamped with our trigger patterns.
arXiv Detail & Related papers (2022-11-02T17:05:45Z)
An anomaly detection approach for backdoored neural networks: face recognition as a case study [77.92020418343022]
We propose a novel backdoored network detection method based on the principle of anomaly detection. We test our method on a novel dataset of backdoored networks and report detectability results with perfect scores.
arXiv Detail & Related papers (2022-08-22T12:14:13Z)
Backdoor Defense in Federated Learning Using Differential Testing and Outlier Detection [24.562359531692504]
We propose DifFense, an automated defense framework to protect an FL system from backdoor attacks. Our detection method reduces the average backdoor accuracy of the global model to below 4% and achieves a false negative rate of zero.
arXiv Detail & Related papers (2022-02-21T17:13:03Z)
CRFL: Certifiably Robust Federated Learning against Backdoor Attacks [59.61565692464579]
This paper provides the first general framework, Certifiably Robust Federated Learning (CRFL), to train certifiably robust FL models against backdoors. Our method exploits clipping and smoothing on model parameters to control the global model smoothness, which yields a sample-wise robustness certification on backdoors with limited magnitude.
arXiv Detail & Related papers (2021-06-15T16:50:54Z)
Black-box Detection of Backdoor Attacks with Limited Information and Data [56.0735480850555]
We propose a black-box backdoor detection (B3D) method to identify backdoor attacks with only query access to the model. In addition to backdoor detection, we also propose a simple strategy for reliable predictions using the identified backdoored models.
arXiv Detail & Related papers (2021-03-24T12:06:40Z)

This list is automatically generated from the titles and abstracts of the papers in this site.