FLARE: Towards Universal Dataset Purification against Backdoor Attacks
- URL: http://arxiv.org/abs/2411.19479v1
- Date: Fri, 29 Nov 2024 05:34:21 GMT
- Title: FLARE: Towards Universal Dataset Purification against Backdoor Attacks
- Authors: Linshan Hou, Wei Luo, Zhongyun Hua, Songhua Chen, Leo Yu Zhang, Yiming Li,
- Abstract summary: Deep neural networks (DNNs) are susceptible to backdoor attacks.
adversaries poison datasets with adversary-specified triggers to implant hidden backdoors.
We propose FLARE, a universal purification method to counter various backdoor attacks.
- Score: 16.97677097266535
- License:
- Abstract: Deep neural networks (DNNs) are susceptible to backdoor attacks, where adversaries poison datasets with adversary-specified triggers to implant hidden backdoors, enabling malicious manipulation of model predictions. Dataset purification serves as a proactive defense by removing malicious training samples to prevent backdoor injection at its source. We first reveal that the current advanced purification methods rely on a latent assumption that the backdoor connections between triggers and target labels in backdoor attacks are simpler to learn than the benign features. We demonstrate that this assumption, however, does not always hold, especially in all-to-all (A2A) and untargeted (UT) attacks. As a result, purification methods that analyze the separation between the poisoned and benign samples in the input-output space or the final hidden layer space are less effective. We observe that this separability is not confined to a single layer but varies across different hidden layers. Motivated by this understanding, we propose FLARE, a universal purification method to counter various backdoor attacks. FLARE aggregates abnormal activations from all hidden layers to construct representations for clustering. To enhance separation, FLARE develops an adaptive subspace selection algorithm to isolate the optimal space for dividing an entire dataset into two clusters. FLARE assesses the stability of each cluster and identifies the cluster with higher stability as poisoned. Extensive evaluations on benchmark datasets demonstrate the effectiveness of FLARE against 22 representative backdoor attacks, including all-to-one (A2O), all-to-all (A2A), and untargeted (UT) attacks, and its robustness to adaptive attacks.
Related papers
- Poisoning with A Pill: Circumventing Detection in Federated Learning [33.915489514978084]
This paper proposes a generic and attack-agnostic augmentation approach designed to enhance the effectiveness and stealthiness of existing FL poisoning attacks against detection in FL.
Specifically, we employ a three-stage methodology that strategically constructs, generates, and injects poison into a pill during the FL training, named as pill construction, pill poisoning, and pill injection accordingly.
arXiv Detail & Related papers (2024-07-22T05:34:47Z) - Revisiting Backdoor Attacks against Large Vision-Language Models from Domain Shift [104.76588209308666]
This paper explores backdoor attacks in LVLM instruction tuning across mismatched training and testing domains.
We introduce a new evaluation dimension, backdoor domain generalization, to assess attack robustness.
We propose a multimodal attribution backdoor attack (MABA) that injects domain-agnostic triggers into critical areas.
arXiv Detail & Related papers (2024-06-27T02:31:03Z) - Generalization Bound and New Algorithm for Clean-Label Backdoor Attack [14.80556378962582]
backdoor attack has the special property that the poisoned triggers are contained in both the training set and the test set.
In this paper, we fill this gap by deriving algorithm-independent generalization bounds in the clean-label backdoor attack scenario.
We propose a new clean-label backdoor attack that computes the poisoning trigger by combining adversarial noise and indiscriminate poison.
arXiv Detail & Related papers (2024-06-02T01:46:58Z) - Lazy Layers to Make Fine-Tuned Diffusion Models More Traceable [70.77600345240867]
A novel arbitrary-in-arbitrary-out (AIAO) strategy makes watermarks resilient to fine-tuning-based removal.
Unlike the existing methods of designing a backdoor for the input/output space of diffusion models, in our method, we propose to embed the backdoor into the feature space of sampled subpaths.
Our empirical studies on the MS-COCO, AFHQ, LSUN, CUB-200, and DreamBooth datasets confirm the robustness of AIAO.
arXiv Detail & Related papers (2024-05-01T12:03:39Z) - FreqFed: A Frequency Analysis-Based Approach for Mitigating Poisoning
Attacks in Federated Learning [98.43475653490219]
Federated learning (FL) is susceptible to poisoning attacks.
FreqFed is a novel aggregation mechanism that transforms the model updates into the frequency domain.
We demonstrate that FreqFed can mitigate poisoning attacks effectively with a negligible impact on the utility of the aggregated model.
arXiv Detail & Related papers (2023-12-07T16:56:24Z) - Universal Detection of Backdoor Attacks via Density-based Clustering and
Centroids Analysis [24.953032059932525]
We propose a Universal Defence against backdoor attacks based on Clustering and Centroids Analysis (CCA-UD)
The goal of the defence is to reveal whether a Deep Neural Network model is subject to a backdoor attack by inspecting the training dataset.
arXiv Detail & Related papers (2023-01-11T16:31:38Z) - FedCC: Robust Federated Learning against Model Poisoning Attacks [0.0]
Federated learning is a distributed framework designed to address privacy concerns.
It introduces new attack surfaces, which are especially prone when data is non-Independently and Identically Distributed.
We present FedCC, a simple yet effective novel defense algorithm against model poisoning attacks.
arXiv Detail & Related papers (2022-12-05T01:52:32Z) - Backdoor Defense in Federated Learning Using Differential Testing and
Outlier Detection [24.562359531692504]
We propose DifFense, an automated defense framework to protect an FL system from backdoor attacks.
Our detection method reduces the average backdoor accuracy of the global model to below 4% and achieves a false negative rate of zero.
arXiv Detail & Related papers (2022-02-21T17:13:03Z) - DAAIN: Detection of Anomalous and Adversarial Input using Normalizing
Flows [52.31831255787147]
We introduce a novel technique, DAAIN, to detect out-of-distribution (OOD) inputs and adversarial attacks (AA)
Our approach monitors the inner workings of a neural network and learns a density estimator of the activation distribution.
Our model can be trained on a single GPU making it compute efficient and deployable without requiring specialized accelerators.
arXiv Detail & Related papers (2021-05-30T22:07:13Z) - Hidden Backdoor Attack against Semantic Segmentation Models [60.0327238844584]
The emphbackdoor attack intends to embed hidden backdoors in deep neural networks (DNNs) by poisoning training data.
We propose a novel attack paradigm, the emphfine-grained attack, where we treat the target label from the object-level instead of the image-level.
Experiments show that the proposed methods can successfully attack semantic segmentation models by poisoning only a small proportion of training data.
arXiv Detail & Related papers (2021-03-06T05:50:29Z) - A Self-supervised Approach for Adversarial Robustness [105.88250594033053]
Adversarial examples can cause catastrophic mistakes in Deep Neural Network (DNNs) based vision systems.
This paper proposes a self-supervised adversarial training mechanism in the input space.
It provides significant robustness against the textbfunseen adversarial attacks.
arXiv Detail & Related papers (2020-06-08T20:42:39Z)
This list is automatically generated from the titles and abstracts of the papers in this site.
This site does not guarantee the quality of this site (including all information) and is not responsible for any consequences.